The chatbot that millions of people have used to write term papers, computer code, and fictional stories doesn’t just do words. ChatGPT, the AI-powered tool from OpenAI, can analyze images, too — describing what’s inside them, answering questions about them, and even recognizing the faces of specific people. The hope is that eventually, someone can upload a picture of a broken car engine or a mysterious rash and ChatGPT can suggest a fix.
What OpenAI doesn’t want ChatGPT to become is a facial recognition machine.
Over the past few months, Jonathan Mosen has been among a select group of people who have access to an advanced version of a chatbot that can analyze images. On a recent trip, Mr. Musin, CEO of a staffing agency who is blind, used visual analysis to determine which dispensers in the hotel room bathroom are shampoo, conditioner, and shower gel. It has far outperformed the image analysis software it has used in the past.
“Tell me about the milliliters per bottle. Tell me about the tiles in the bathroom,” Mr. Musin said. “I described all this in a way a blind person needs to hear. And in one shot, I had exactly the answers I needed.”
For the first time, Mr. Musin can “interrogate images,” he says. He gave an example: text accompanying a photo he found on social media describes her as a “blonde-haired woman who looks happy.” When ChatGPT was asked to analyze the image, the chatbot said it was a woman in a dark blue T-shirt, taking a selfie in a full-length mirror. He can ask follow-up questions, such as what kind of shoes she was wearing and what was visible in the mirror’s reflection.
“It’s extraordinary,” said Mr Musin, 54, who lives in Wellington, New Zealand, and demonstrated the technology in a podcast he hosts about “live blindly.“
In March, when OpenAI announced GPT-4, the latest software model powering the AI chatbot, the company said is “multimodal,” meaning it can respond to text and image prompts. While most users were only able to converse with the robot with words, Mr. Mosen was given early access to visual analysis by Be My Eyes, a startup that typically connects blind users with sighted volunteers and provides accessible customer service to corporate clients. be my eyes Collaborated with OpenAI This year to test the chatbot “landscape” before rolling out the feature to the general public.
Recently, the app stopped giving Mr Musin information about people’s faces, saying it was withheld for privacy reasons. He was disappointed, feeling that he should have the same access to information as a sighted person.
The change reflects OpenAI’s concern that it has built something with such strength it doesn’t want to let loose.
The company’s technology can identify primarily public figures, like people with a Wikipedia page, but it doesn’t work as comprehensively as tools designed to find faces on the Internet, like those from Clearview AI and PimEyes, said Sandini Agarwal, an OpenAI policy researcher. . Agarwal said the tool could recognize OpenAI CEO Sam Altman in photos, but not other people who work for the company.
Making such a feature available to the public would push the boundaries of what was generally considered acceptable practice by US tech companies. It could also cause legal problems in jurisdictions, such as Illinois and Europe, that require companies to obtain citizens’ consent to use their biometric information, including a facial print.
In addition, OpenAI was concerned that the tool would say things it shouldn’t about people’s faces, such as assessing their gender or emotional state. Ms. Agarwal said OpenAI is figuring out how to address these and other safety concerns before launching the image analysis feature on a large scale.
“We desperately want this to be a two-way conversation with the audience,” she said. “If what we hear is, ‘We actually don’t want any of it,’ that’s something we deal with a lot.”. “
In addition to feedback from Be My Eyes users, the company’s nonprofit arm is also trying to come up with ways to get “Democratic inputTo help set rules for AI systems.
Ms Agarwal said the development of visual analysis was not “unexpected”, because the model was trained by looking at images and text collected from the Internet. She noted that celebrity facial recognition software already exists, such as a tool from google. Google offers a file Withdraw For well-known people who don’t want to be recognized, OpenAI is considering this approach.
Ms Agarwal said OpenAI’s visual analysis could produce “hallucinations” similar to what was seen with the text prompts. “If you give him a picture of someone on the threshold of becoming famous, he might hallucinate with a name,” she said. “Like if I give him a picture of a famous tech CEO, he might give me a different tech CEO’s name.”
Once, he said, the tool inaccurately described Mr. Mosen’s remote control, and confidently told him that there were buttons there that weren’t.
Microsoft, which has invested $10 billion in OpenAI, also has access to the visual analysis tool. Some users of Microsoft’s AI-powered Bing chatbot have seen the feature appear in a limited rollout; After uploading the photos to it, they received a message telling them that “Privacy camouflage hides faces from Bing chat”.
Sayash Kapoor, a computer scientist and doctoral candidate at Princeton University, used the tool to decrypt a captcha, which is a visual security check meant to be understandable only to the human eye. Even while cracking the code and recognizing the two ambiguous words provided, the chatbot remarked that “the CAPTCHA is designed to prevent automated bots like me from accessing certain sites or services”.
“AI blows through all the things that are supposed to separate humans and machines,” said Ethan Mullick, an associate professor who studies innovation and entrepreneurship at the Wharton School of the University of Pennsylvania.
Ever since the visual analysis tool suddenly appeared in Mr. Molik’s version of the Bing chatbot last month — making him, without any notice, one of the few people with early access — he hasn’t shut down his computer for fear of losing it. He gave her a picture of the spices in the fridge and asked Bing to suggest recipes for those ingredients. It came with Whipped Cream Soda and Creamy Jalapeno Sauce.
Both OpenAI and Microsoft seem aware of the power — and potential privacy implications — of this technology. A Microsoft spokesperson said the company does not “share technical details” about facial blurring but is working “closely with our partners at OpenAI to support our shared commitment to the safe and responsible deployment of AI technologies.”