Launch of sphere by meta, an AI knowledge tool based on publicly accessible web content, initially used to validate citations on Wikipedia

Anurag Sharma

Updated · Jul 14, 2022

Scoop.market.us is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission. Learn more.

Facebook could be famous for its role in the rise of “fake news,” but the company has also sought to establish a presence in the ongoing fight against it. In the most recent development on that front, Facebook parent company Meta today unveiled a brand-new tool called Sphere, AI developed around the idea of drawing from the enormous informational pool on the open web to act as a knowledge base for other systems and AI. The initial use of Sphere, according to Meta, is Wikipedia, where it is being used in a production phase (not for live entries) to automatically scan entries and determine whether citations are well-supported or not.

The concept behind using Sphere to Wikipedia is simple. The online encyclopedia currently has 6.5 million entries and sees an average of 17,000 new articles each month. This wiki concept means that content can be added and edited by anyone. Although there are editors to oversee this, it is daunting because it is so large and because it has a mandate. It is also because many people and institutions rely on it for their records.
The Wikimedia Foundation overseeing Wikipedia has also been looking at new ways to leverage all of that data. It announced last month an Enterprise Tier, and its first two commercial customers are Google and the Internet Archive. These companies use Wikipedia-based data to generate their own business-generating interest.

They will now have broader and more formal service agreements.
While today’s announcements regarding Meta working with Wikipedia don’t refer to Wikimedia Enterprise in any way, potential customers of Enterprise will want to be aware that additional tools will be added to Wikipedia to verify and correct the content.
Meta has assured me that there is no financial arrangement in this deal, and neither Wikipedia nor Meta will become a paying customer of Meta’s.

The Sphere model was trained using “a novel data set (WAFER) comprising 4 million Wikipedia citations, much more intricate than ever used for this type of research,” according to Meta. There is clearly a deeper connection there as Meta publicly confirmed that Wikipedia editors were also using a new AI-based translator tool that it had developed.

Meta continues to feel the weight of a negative public perception. This is partly due to accusations that Meta allows misinformation and harmful ideas to gain ground. If you are someone who ends up in “Facebook prison”, you may believe you have shared something fine but have still fallen prey to over-zealous social media police. Although it’s a messy situation, Meta feels that Sphere is a PR exercise. If it succeeds, it will show that people are trying to work in good faith.

Meta believes that Sphere’s “white box” knowledge source has more data than any other “black box” source. This is in contrast to the traditional “black box” sources that are based on proprietary search engines. It noted that Sphere has access to far more public data than the standard models and could therefore provide valuable information that they do not have. Meta used 134 million documents to train Sphere. They were divided into 906 million passages, each containing 100 tokens.

Meta claims that open-sourcing this tool is a better foundation for AI training models than any proprietary one. It does acknowledge that the foundations of knowledge may be unstable, particularly in the early stages. What if “truth” is not being reported as widely in the same way as misinformation? Meta intends to put its future efforts into Sphere on this topic. It noted that the next step was to train models to evaluate the quality of retrieved documents and detect possible contradictions. They will also prioritize trustworthy sources. And, if there is no convincing evidence, they can still be stumped.”

This raises interesting questions about Sphere’s hierarchy and truth based on other knowledge bases. Open-sourced means that users may be able to modify the algorithms to suit their needs. (For example, a user using Sphere to verify legal references may give more credibility to court filings and case databases than a user verifying sports or fashion references. This would place a greater emphasis on other sources.
Meta confirmed that it does not use Sphere or a modified version thereof on its own platforms, such as Messenger, Instagram, and Messenger. These platforms have been long plagued by misinformation and the toxicity of bad actors. We also inquired if there were any other customers interested in Sphere. It comes with separate tools for managing its content and moderating.

It seems like something like this was designed for a huge scale. Wikipedia’s current size is undoubtedly greater than any human team could verify for accuracy. The sphere is used to scan hundreds of thousands upon hundreds of thousands of citations at once to identify when a citation isn’t having much support. It noted that Sphere would suggest a more relevant source even pointing to the exact passage supporting the claim.
This is still in the production phase, but it sounds like editors are selecting passages that might need to be verified. “Eventually, we want to create a platform that will help Wikipedia editors quickly spot citation problems and fix them or correct the article’s content at scale.

Anurag Sharma

He has been helping in business of varied scales, with key strategic decisions. He is a specialist in healthcare, medical devices, and life-science, and has accurately predicted the trends in the market. Anurag is a fervent traveller, and is passionate in exploring untouched places and locations. In his free time, he loves to introspect and plan ahead.

Latest from Author