More ‘human-like’ GPT-4o thrills world AI enthusiasts
Published: May 14, 2024 08:45 PM
Photo: VCG

Photo: VCG

Many AI enthusiasts have spent another sleepless night as OpenAI announced on Monday a new flagship model that has industry professionals buzzing. This model can translate more than 50 languages in real-time, and it is also more "human-like" - with response times now comparable to humans. Additionally, it can sing, recognize emotions and provide users with "emotional values" even better than humans. 

In the announcement, OpenAI said the GPT-4o new model provides GPT-4-level intelligence but is much faster and improvements have been made on its capabilities across text, voice and vision.

When the Global Times asked on OpenAI what can GPT-4o do, the newest model gives the following response, "Here are some key capabilities of GPT-4o such as natural language understanding and generation: GPT-4o excels at understanding and generating human-like text, enabling it to hold conversations, answer questions and provide detailed explanations." GPT-4o is especially better at vision and audio understanding compared with any existing models, the company said.

The new model, adorably dubbed by netizens as a "digital personal assistant," can engage in real-time spoken conversations. For example, in the Monday demonstration, OpenAI executives asked it to solve a math problem as well as telling a bedtime story with various levels of "drama" in its voice. The model completed the tasks, convincingly. 

The AI then told the story in a stereotypical robot's voice as required by the executives, and then again in a sing-song fashion that made everyone laugh.

The entire process of communication was smooth and natural, no different than talking with a human being - at one point, upon hearing an executive's panting, GPT-4o told him to "calm down," while joking, "You're not a vacuum cleaner."

OpenAI Chief Technology Officer Mira Murati said the updated version of ChatGPT will now also have memory capabilities, meaning it can learn from previous conversations with users, and can do real-time translation, CNN reported. According to the company, the tool now accommodates more than 50 languages.

Some netizens on social media X platform shared a demo of GPT-4o showing students sharing their iPad screen with the new GPT-4o, with the AI speaking with them and helping them learn in real-time.

"Imagine giving this to every student in the world," a netizen said. "The future is so, so bright."

A Beijing resident surnamed Chen told the Global Times on Tuesday upon trying the new model first thing in the morning that, "the most impressive part was the live demonstration. During the voice conversation with GPT-4o, three people interrupted at random, yet GPT-4o responded extremely quickly and with a very rich tone. It was just like chatting with a human." 

Given many of her friends work in the translation and interpretation field, Chen said, a little worryingly, "GPT-4o also served as a real-time translator at the event, seamlessly translating between Italian and English. It felt like simultaneous interpreters might be out of a job soon."

Shen Yang, a professor studying AI and media at Tsinghua University in Beijing, told the Global Times on Tuesday that the primary goal for this upgrade is to enlarge OpenAI's user base by collaborating with iPhone's Siri, so that its user base can hopefully expand from the current 100 million weekly active users to a billion. 

"This upgrade marks a shift from simulating consciousness to simulating life, with a focus on voices, images and visual elements. Additionally, there is a significant market potential in hardware devices embedding AI, where GPT-4o will play a role in better understanding the world," Shen said.

For AI, the most important aspects are its reasoning and intelligence capabilities, according to the expert. Shen believes the new model "now equates to a doctoral level" in terms of problem-solving abilities. While in terms of image processing, the enhancements are quite noticeable, including improved image consistency, reduced AI illusions and better integration of text and image scenes. 

"I believe there is indeed a gap between China and the US [in terms of AI technology], and I have always insisted on this," Zhou Hongyi, founder and chairman of 360 Security Technology, told the Global Times in a previous interview. "Only by recognizing the gap can we know how to catch up. If you don't admit there is a gap and think we are all far ahead, there is no chance in catching up."

However, Zhou said the main difference between China and the US in AI lies in "determining the technical direction," but once the direction is determined, with China's strong advantage in its rapid learning capabilities, the gap will be narrowed within one or two years. The year of 2024 may become the "year of application" for China in the field of AI, Zhou noted.