Key Points
- OpenAI introduced the “Strawberry” series of AI models, designed to solve complex problems in science, coding, and math.
- The o1 model, part of the Strawberry series, uses “chain-of-thought” reasoning to break down complex tasks, resulting in significant performance improvements.
- The o1 model scored 83% on the International Mathematics Olympiad qualifying exam and exceeded human PhD-level accuracy in science benchmarks.
- OpenAI has automated the “chain-of-thought” technique, allowing the models to independently tackle complex problems without user prompting.
OpenAI, backed by Microsoft, has unveiled its new “Strawberry” series of AI models designed to improve performance in complex tasks by spending more time processing queries. This new series aims to tackle harder problems in science, coding, and mathematics, outperforming previous models. The models, codenamed internally as “Strawberry” and officially launched as “o1” and “o1-mini,” will be available in ChatGPT and its API starting Thursday, as announced by the company.
The development of the Strawberry series marks a significant step forward in AI’s ability to reason through challenging problems. Noam Brown, a researcher at OpenAI who specializes in improving reasoning within the company’s models, confirmed the launch on social media platform X, expressing excitement about the models’ capabilities in general reasoning. He highlighted that these models result from OpenAI’s efforts to enhance AI’s cognitive skills.
OpenAI’s blog post revealed impressive performance metrics for the o1 model, which scored 83% on the qualifying exam for the International Mathematics Olympiad—a significant leap from the 13% score of the previous model, GPT-4o. Additionally, the o1 model demonstrated superior performance in competitive programming questions and surpassed human Ph.D.-level accuracy on a benchmark of science problems.
The enhanced performance of the Strawberry models is attributed to a technique called “chain-of-thought” reasoning. This approach involves breaking down complex problems into smaller, logical steps, allowing the AI to understand better and solve tasks incrementally. While this technique has been used as a prompting method in AI research, OpenAI has now automated the process, enabling the models to decompose problems without user intervention independently.
“We trained these models to spend more time thinking through problems before they respond, much like a person would,” OpenAI stated. “Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.” This method allows the AI models to improve their problem-solving abilities by iteratively adjusting their approach. OpenAI’s ongoing efforts to enhance AI reasoning began under the project name Q*, first reported in November 2023, and later renamed Strawberry in July.