StepFun speech model tops global Speech Reasoning with 96.4 percent accuracy

By Chen Qingrui Published: Jan 15, 2026 11:43 PM

Artificial intelligence Photo: VCG

Chinese large language model startup StepFun's speech model Step-Audio R1.1 (Realtime) ranked first globally in the Speech Reasoning category with an accuracy rate of 96.4 percent, according to data from industry benchmarking platform Artificial Analysis.

The achievement marks the nation's latest high-profile showing in global competition among large language models. StepFun said on its official WeChat account that the result set a new record. The model has been open-sourced.

Artificial Analysis' Speech Reasoning benchmark is regarded as one of the industry's most authoritative third-party standards for evaluating native audio models, with a core focus on models' ability to process audio directly and perform complex logical reasoning. Key metrics include accuracy and first-packet latency, StepFun said in the WeChat account.

Step-Audio R1.1 (Realtime) has outperformed leading peers, including Grok, Gemini and GPT-Realtime, according to the ranking.

According to Artificial Analysis, Step-Audio R1.1 (Realtime) delivers the strongest cost-performance advantage among its peers and also leads comparable speech models in the overall balance of performance and speed.

The current Step-Audio R1.1 is the latest upgraded version, offering stronger real-time conversational capabilities alongside more advanced speech reasoning, according to the company.

A full real-time voice Application Programming Interface API is set to launch in February, while the currently available chat mode already runs on the R1.1 core, supporting streaming, think-while-speaking reasoning.

Examples shared by the company show that the model can be used to analyze videos of cats fighting, identifying their emotions and intended signals, while separating human voices and quickly recognizing the status of different characters in the footage.

Native speech recognition has long been a closely watched path in AI development, as it preserves intonation, emotion and other nuances that are often lost in traditional speech-to-text systems, bringing machines closer to real human communication, Ma Jihua, a veteran telecom industry analyst, told the Global Times on Thursday.

Powered by breakthroughs in technological innovation, real-world application and ecosystem building, China's large language models are moving into the global top tier, emerging as a foundational intelligent infrastructure driving a leap in productivity, according to the Xinhua News Agency.

Chinese experts also highlighted that achieving performance breakthroughs through algorithmic innovation under constrained resources underscores the soundness of China's chosen development path in science and technology.

On Thursday, the internationally recognized large model benchmarking platform LMArena released its latest rankings, offering favorable evaluations of several Chinese large language models across key performance metrics.

Baidu's ERNIE-5.0-0110 achieved a score of 1,460, ranking No. 1 among Chinese models and No. 8 globally on the LMArena Text leaderboard, according to the list.

Notably, Baidu's ERNIE previously ranked second globally and first in China for text capabilities in LMArena's rankings released late last year, according to Science and Technology Daily.

Thursday's result noted it outperformed several leading models, including GPT-5.1-High and Gemini-2.5-Pro, demonstrating strong text capabilities, according to LMArena.

In addition, according to the website's Tuesday rankings, China-based start-up MiniMax M2.1 ranked No.7 globally on the web development tasks built in the Code Arena.

Ma Jihua, a veteran telecom industry analyst, told the Global Times on Thursday that China holds clear advantages in AI development, including a deep talent pool, a consistently high volume of intellectual property patents, and a wide range of real-world application scenarios. These strengths enable multi-track innovation, iterative upgrades through practical deployment, and the emergence of a low-cost, high-efficiency AI ecosystem driven by continuous innovation.

During the 14th Five-Year Plan period (2021-25), China achieved new advances in integrated circuits, artificial intelligence and foundational software, with more than 700 generative AI models completing regulatory filing, embodied intelligence steadily moving toward industrial application, and intelligent agents seeing accelerated innovation, according to another Xinhua report.

Private firms comprise over 90% of China’s high-tech enterprises as of end-2024: report

Private firms accounted for more than 90 percent of China's high-tech enterprises as of the end of 2024. ...

China launches sunset review of anti-dumping duties on polyphenylene sulfide imports from Japan, US, South Korea, Malaysia

China's Ministry of Commerce (MOFCOM) announced on Sunday the decision to launch a sunset review investigation, effective Monday, ...

China makes progress in agricultural technology innovation: official

China’s agricultural science and technology innovation entered the world’s forefront in 2022, with agricultural technology’ contribution in total ...