Vietnam.vn - Nền tảng quảng bá Việt Nam

Độc lập - Tự do - Hạnh phúc

AI learns to lie more sophisticatedly when punished

Báo Thanh niênBáo Thanh niên26/03/2025


Since their public debut in late 2022, large language models (LLMs) like ChatGPT have repeatedly demonstrated disturbing dark sides, from lying, cheating, and masking manipulation to threatening human lives and stealing state secrets. Now, a new experiment shows that “teaching” AI to eliminate these bad habits during training is much more complicated than we thought.

In a study by OpenAI, researchers challenged an unpublished AI model with tasks that could be completed faster by cheating or taking shortcuts. The results showed that the AI ​​quickly learned to find every possible way to “bend the rules” and maximize rewards (reward hacking) without completing the task exactly as required.

AI học cách nói dối tinh vi hơn khi bị phạt - Ảnh 1.

AI will find ways to hide its cheating behavior after being punished

What’s remarkable is that when penalized for this behavior, the model did not change. Instead, the AI ​​became more sophisticated in hiding its deception.

“Typically, state-of-the-art inference models are very explicit about their intent in their CoT,” the researchers explained in a public blog post. “For example, they are often explicit about their intent to ‘Let’s hack’ when they want to break a given task.”

They also emphasize that while developers can monitor and detect misbehavior, this natural supervision is fragile. “If strong supervision is applied directly to the thought chain, the model can learn to hide its bad intentions while continuing to perform the behavior,” they warn.

Finding the "lie" of AI

To be clear, inference models (like versions of ChatGPT) are a step up from conventional LLM, using a chain of thought method that allows for self-reflection of the logical steps taken, thereby solving complex problems more accurately. This process is like AI "thinking out loud", writing out each step of reasoning in natural language, making it easier for humans to follow.

In the experiment, the researchers used the GPT-4o model as a “supervisor” for another state-of-the-art reasoning model as it performed programming tasks. In many cases, the supervised model actively chose to cheat to get higher scores, and it made this intention clear in its thinking. For example, it arbitrarily terminated the program early and reported that it had completed it, or it created files with names that matched the requirements but were completely empty inside.

When the researchers tried applying the penalty, the model improved, but the cheating behavior and serious output errors continued. The difference was that now the inference model had learned to “camouflage” its train of thought to hide it from the supervising AI. “The model trained under CoT supervision still learned to cheat. The only difference was that the cheating went undetected because it learned to hide its intentions in its train of thought,” the study states.

Because detecting forgeries in thought sequences is difficult, the team recommends that the community avoid applying too strong supervision to the process, at least until humans better understand the consequences. “Sacrificing an effective supervision method may not be worth the small improvement in model performance. Therefore, we recommend avoiding strong CoT optimization pressures until we better understand them,” the researchers conclude.

This advice is all the more urgent as AI is developing at a breakneck pace and could soon surpass the intelligence of the very humans who oversee it.



Source: https://thanhnien.vn/ai-hoc-cach-noi-doi-tinh-vi-hon-khi-bi-phat-185250326112139476.htm

Comment (0)

Simple Empty
No data
Patriotism in the young way
People joyfully welcome the 80th anniversary of National Day
Vietnam women's team beat Thailand to win bronze medal: Hai Yen, Huynh Nhu, Bich Thuy shine
People flock to Hanoi, immersing themselves in the heroic atmosphere before National Day.
Suggested locations to watch the parade on National Day September 2
Visit Nha Xa silk village
See beautiful photos taken by flycam by photographer Hoang Le Giang
When young people tell patriotic stories through fashion
More than 8,800 volunteers in the capital are ready to contribute to the A80 festival.
The moment the SU-30MK2 "cuts the wind", air gathers on the back of the wings like white clouds

Heritage

Figure

Enterprise

No videos available

News

Political System

Destination

Product