Meta’s new Sphere AI tool will check the accuracy of Wikipedia entries

2022-07-19 14:41:21
关注

Researchers at Meta have created a new artificial intelligence tool, Sphere, that can access 134 million web pages and use them as a knowledge base for building AI systems. The first organisation to use the tool, which is being made available on an open-source licence, is Wikipedia, which will deploy it to scan hundreds of thousands of citations on the online encyclopedia to check they support the corresponding claims.

Meta’s new Sphere AI tool is being used by Wikipedia to check citations. (Photo by brightstars/iStock)

The dataset that can be accessed through Sphere is an order of magnitude larger than any previously released for AI research, Meta claims. For Wikipedia, it will call attention to questionable citations, highlighting those that human editors need to evaluate and change. If the citation proves irrelevant, the model can also suggest other applicable sources of information that do back up the claim in the text.

How will Meta Sphere work with Wikipedia?

Wikipedia relies heavily on citations written in the footnotes of its articles to verify claims made within the text. Its 6.5 million articles are updated regularly by volunteers, and sometimes the citations don’t back up the claims being made.

Using Sphere, Meta says the goal is to provide a platform for Wikipedia editors that can “systematically spot citation issues” and correct the citation or the content of the corresponding article at scale – rather than requiring manual trawls post-by-post.

Related

AI and automation

UK’s ‘collaborative’ approach to AI regulation may prove complex and burdensome

AI and automation

UK government to set out AI regulation plans

AI and automation

IBM acquires Databand to improve machine learning models

AI and automation

Vodafone pens ‘risky’ AI deal with Google Cloud

The tools are built on the back of an existing Meta AI model that integrates information retrieval and verification. It involved training neural networks to learn more nuanced representations of language to pinpoint source material. The latest changes to this involved significantly increasing the size of the pool of data the model can draw from.

This new version of the model, Sphere, references up to 134 million web pages. For Wikipedia, Meta fed it with four million claims from the online encyclopedia, teaching it to zero in on a single source from the vast pool to validate each statement.

The index produced in this process passes potential sources for a Wikipedia article on to an evidence-ranking model that compares the text to the citation and determines whether the citation matches and is a viable option for inclusion in the footnotes.

Content from our partners

How can digital technologies and big data transform the travel experience?

How clinical trials infrastructure is undergoing digital transformation

Webinar – Top 3 Ways to Build Security into DevOps

“Usually, to develop models like this, the input might be just a sentence or two,” a Meta statement said. “We trained our models with complicated statements from Wikipedia, accompanied by full websites that may or may not support the claims. As a result, our models have achieved a leap in performance in terms of detecting the accuracy of citations.”

The implications of Sphere for Meta and Wikimedia Enterprise

All of this work acts to improve Sphere and will in turn potentially allow for new AI systems that can make sense of the real world, according to Meta.

“Open source projects like these, which teach algorithms to understand dense material with an ever-higher degree of sophistication, help AI make sense of the real world,” a Meta blog post said. “While we can’t yet design a computer system that has a human-level comprehension of language, our research creates smarter, more flexible algorithms. This improvement will only become more important as we rely on computers to interpret the surging volume of text citations generated each day.”

Indeed, the company also argues that the breadth of sources used by Sphere means it provides more accurate results than other comparable systems. “Because Sphere can access far more public information than today’s standard models, it could provide useful information that they cannot,” the blog post added.

Data, insights and analysis delivered to you View all newsletters By The Tech Monitor team Sign up to our newsletters

For the Wikimedia Foundation, the non-profit organisation which oversees Wikipedia, ensuring accuracy is more important than ever before. Last month it launched Wikimedia Enterprise, a commercial product for businesses that require a high level of access to its databases.

Shani Evenstein Sigalov, a lecturer and researcher at Tel Aviv University, and vice-chair of the Wikimedia Foundation’s board of trustees described the new technique as a “powerful example of machine learning tools that can help scale the work of volunteers.”

“Improving these processes will allow us to attract new editors to Wikipedia and provide better, more reliable information to billions of people around the world,” he says.

Read more: Knowledge management is the fastest-growing area of AI spend

Topics in this article: Meta

参考译文
Meta的新Sphere人工智能工具将检查维基百科条目的准确性
Meta的研究人员开发了一款新的人工智能工具Sphere,它可以访问1.34亿个网页,并将这些内容作为构建人工智能系统的基础知识库。首个使用该工具的机构是维基百科,维基百科以开源许可证的形式获得该工具,并计划部署其扫描在线百科全书中数十万条引述,以确认这些引述是否支持相应的主张。Meta的新Sphere人工智能工具正在被维基百科用于检查引述。(照片由brightstars/iStock提供)Meta称,通过Sphere可访问的数据集规模比以往任何用于人工智能研究的数据集都大一个数量级。对维基百科而言,该工具能够指出存在问题的引述,突出那些需要人工编辑者审阅和修改的内容。如果引述被证明无关,模型还可以建议其他适用的信息来源,这些来源确实能够支持文本中的主张。Meta Sphere将如何与维基百科合作?维基百科在其文章的脚注中依赖大量引述来验证文本中的主张。它拥有650万篇文章,由志愿者定期更新,有时引述并不支持所提出的主张。Meta表示,Sphere的目标是为维基百科编辑者提供一个平台,系统性地发现引述问题,并在大规模范围内纠正引述或相应文章的内容——而不是要求编辑者一篇一篇地手动查找。相关的人工智能与自动化内容:英国对人工智能监管的“协作”方法可能会变得复杂且负担沉重人工智能与自动化英国政府将公布人工智能监管计划人工智能与自动化IBM收购Databand,以改善机器学习模型人工智能与自动化沃达丰与谷歌云达成“高风险”人工智能交易这些工具建立在Meta现有人工智能模型的基础上,该模型整合了信息检索和验证功能。该工具的开发过程包括训练神经网络,使其学习更细腻的语言表示,以精确定位信息来源。对此模型的最新改进是大幅增加了其可访问数据的规模。这个新版本的模型Sphere,可以引用多达1.34亿个网页。对于维基百科,Meta向其输入了来自在线百科全书的400万条主张,教会它在庞大的信息池中精准定位单一来源,以验证每一条陈述。在此过程中生成的索引,会将潜在的来源传递给一个证据排序模型,该模型将文本与引述进行比较,以确定引述是否匹配并是一个可纳入脚注的可行选项。来自我们合作伙伴的内容:数字技术和大数据如何改变旅行体验?临床试验基础设施如何实现数字化转型网络研讨会——构建安全DevOps的三种方法Meta的一份声明称:“通常,开发这类模型时,输入的可能只是几句句子。我们使用来自维基百科的复杂陈述进行训练,并附带可能支持或不支持这些陈述的完整网站。因此,我们的模型在检测引述准确性方面实现了性能的飞跃。”Sphere对Meta和维基媒体企业的影响Meta表示,所有这些工作都将提升Sphere,并进而可能开发出能够理解现实世界的新人工智能系统。“这种开源项目通过教授算法以更高的复杂度理解密集型材料,有助于人工智能更好地理解现实世界,”Meta的一篇博客文章写道。“尽管我们还不能设计出具有人类语言理解水平的计算机系统,但我们的研究正在创建更聪明、更灵活的算法。这种改进将随着我们依赖电脑解读每日激增的文本引述时变得越来越重要。”事实上,公司还指出,Sphere所使用的来源范围,使它比其他类似系统能提供更准确的结果。该博客文章补充道:“因为Sphere可以访问远超当今标准模型的公开信息,它可以提供它们无法提供的重要信息。”数据、见解和分析将直接送达您查看所有新闻通讯由《Tech Monitor》团队提供订阅我们的新闻通讯点击此处对于维基媒体基金会——这家监管维基百科的非营利组织而言,确保准确性比以往任何时候都更加重要。上个月,它推出了维基媒体企业(Wikimedia Enterprise),一个面向需要访问其数据库的企业的商业产品。特拉维夫大学的讲师兼研究员Shani Evenstein Sigalov,同时也是维基媒体基金会董事会副主席,将这一新技术称为“机器学习工具的一个强大范例,能够帮助志愿者扩大工作规模”。“改进这些流程将使我们能够吸引更多编辑加入维基百科,并为全球数十亿人提供更佳、更可靠的信息,”他说。阅读更多:知识管理是人工智能支出增长最快的一个领域本文主题:Meta
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

提取码
复制提取码
点击跳转至百度网盘