Tech giant OpenAI has touted its speech-to-text tool Whisper as an AI with “human-like accuracy and robustness.” But Whisper has one major flaw: It generates completely bogus text and sentences.
Some of the AI-generated text — called “hallucinations” — can include racial comments, violent language and even imaginary medical treatments — Photo: AP
Some of the AI-generated text is unreal, called “hallucinatory,” according to AP, and includes racial commentary, violent language and even imaginary medical treatments.
High rate of "illusion" in AI-generated texts
Experts are particularly concerned because Whisper is widely used in many industries around the world to translate and transcribe interviews, generate text in popular consumer technologies, and create subtitles for videos.
More worryingly, many medical centers are using Whisper to transfer consultations between doctors and patients, although OpenAI has warned that the tool should not be used in “high-risk” areas.
The full extent of the problem is difficult to determine, but researchers and engineers say they regularly encounter Whisper "hallucinations" in their work.
A researcher at the University of Michigan said he found “hallucinations” in eight out of ten audio transcriptions he examined. A computer engineer found “hallucinations” in about half of the transcriptions of more than 100 hours of audio he analyzed. Another developer said he found “hallucinations” in nearly all of the 26,000 recordings he created using Whisper.
The problem persists even with short, clearly recorded audio samples. A recent study by computer scientists found 187 “illusions” in more than 13,000 clear audio clips they examined. The trend would lead to tens of thousands of false transcriptions in millions of recordings, the researchers said.
Such errors can have “very serious consequences,” especially in hospital settings, according to Alondra Nelson, who headed the White House Office of Science and Technology in the Biden administration until last year.
“No one wants a misdiagnosis,” said Nelson, now a professor at the Institute for Advanced Study in Princeton, New Jersey. “There needs to be a higher standard.”
Whisper is also used to create captions for the deaf and hard of hearing—a population that is particularly at risk for mistranslations. This is because deaf and hard of hearing people have no way to identify fabricated passages “hidden in all the other text,” says Christian Vogler, who is deaf and director of the Technology Accessibility Program at Gallaudet University.
OpenAI is called upon to solve the problem
The prevalence of such “hallucinations” has led experts, advocates, and former OpenAI employees to call on the federal government to consider AI regulations. At a minimum, OpenAI needs to address this flaw.
“This problem is solvable if the company is willing to prioritize it,” said William Saunders, a research engineer in San Francisco who left OpenAI in February over concerns about the company’s direction.
“It’s a problem if you release it and people get so confident about what it can do that they integrate it into all these other systems,” an OpenAI spokesperson said. The company is constantly working on ways to mitigate these “illusions” and values researchers’ findings, adding that OpenAI incorporates feedback into model updates.
While most developers assume that text-to-speech engines can make typos or other mistakes, engineers and researchers say they've never seen an AI-powered text-to-speech engine that "hallucinates" as much as Whisper.
Source: https://tuoitre.vn/cong-cu-ai-chuyen-loi-noi-thanh-van-ban-cua-openai-bi-phat-hien-bia-chuyen-20241031144507089.htm
Comment (0)