Meta is working on developing tools to identify images synthetically produced by generative AI systems at scale across its social media platforms, such as Facebook, Instagram, and Threads, the company said on Tuesday.
“…we’ve been working with industry partners to align on common technical standards that signal when a piece of content has been created using AI. Being able to detect these signals will make it possible for us to label AI-generated images that users post to Facebook, Instagram and Threads,” Nick Clegg, president of global affairs at Meta, wrote in a blog post.
“We’re building this capability now, and in the coming months we’ll start applying labels in all languages supported by each app,” Clegg added.
The move to label AI-generated images from companies, such as Google, OpenAI, Adobe, Shutterstock, and Midjourney, assumes significance as 2024 will see several elections taking place in several countries including the US, the EU, India, and South Africa.
This year will also see Meta learning more about how users are creating, and sharing AI-generated content and what kind of transparency netizens are finding valuable, the Clegg said.
Clegg’s statement about elections rings in a reminder of the Cambridge Analytica scandal, unearthed by the New York Times and The Observer back in 2018, that saw Facebook data of at least 50 million users being compromised.
Last month, ChatGPT-maker OpenAI suspended two developers who created a bot mimicking Democratic presidential hopeful Congressman Dean Phillips, marking the company’s first action against the misuse of AI.
No labels for AI-generated audio and video
Meta, according to Clegg, already marks images created by its own AI feature, which includes attaching visible markers and invisible watermarks. These invisible watermarks are embedded within the image’s metadata.
The combination of these watermarks makes it easy for other platforms to identify AI-generated images, Clegg said.
Meta is also working with other companies to develop common standards for identifying AI-generated images through forums like the Partnership on AI (PAI), Clegg added.
However, he also pointed out that while several companies were starting to include signals to help identify generated images, the same policy was not being applied to generated audio and video.
In the absence of identification policies around generated audio and video, Clegg said that Meta was adding a feature for people to disclose when they share AI-generated video or audio so the company can add a label to it.
“We’ll require people to use this disclosure and label tool when they post organic content with a photorealistic video or realistic-sounding audio that was digitally created or altered, and we may apply penalties if they fail to do so,” Clegg wrote in the blog.
More adversarial challenges to come
Meta acknowledged that while the tools and standards being developed are at the cutting edge of what’s possible around labeling generated content, bad actors could still find avenues to strip out invisible markers.
To stop such bad actors, the company said it was working on developing classifiers that can help the company to automatically detect AI-generated content, even if the content lacks invisible markers.
“At the same time, we’re looking for ways to make it more difficult to remove or alter invisible watermarks. For example, Meta’s AI Research lab FAIR recently shared research on an invisible watermarking technology we’re developing called Stable Signature,” Clegg wrote.
The Stable Signature integrates the watermarking mechanism directly into the image generation process for some types of image generators, which could be valuable for open source models so the watermarking can’t be disabled, the top executive explained.
Historically, the company has used AI systems before to detect and take down hate speech and other content forms that violated its policy.
As of the third quarter of 2023, Meta claims that its AI systems helped reduce the prevalence of hate speech on Facebook to just 0.01-0.02%.
Meta is planning to use generative AI to take down harmful content faster. “We’ve started testing large language models (LLMs) by training them on our ‘Community Standards’ to help determine whether a piece of content violates our policies. These initial tests suggest the LLMs can perform better than existing machine learning models,” Clegg wrote.
These LLMs are also helping the company remove content from review queues in certain circumstances when its reviewers are highly confident it doesn’t violate the company’s policies, Clegg added.