Top

Why is ‘Strawberry’ a Cherry on the AI Cake?

  • OpenAI’s latest model ChatGPT-o1 is being hailed for its human-like reasoning abilities and for setting new standards in AI.
  • ChatGPT-o1 is stated to possess true general reasoning capabilities, excelling in a range of benchmark tests.
  • With o1’s advanced reasoning competitive edge, the overly crowded LLM landscape will shift in OpenAI’s favor.

On September 12, OpenAI unveiled its latest model ChatGPT-o1, hailed for its human-like reasoning abilities and for setting new standards in AI. Dubbed the “Strawberry Model”, it had been highly awaited by OpenAI leadership, including CEO Sam Altman and former Chief Scientist Ilya Sutskever. ChatGPT-o1 is stated to possess true general reasoning capabilities, excelling in a range of benchmark tests.

Benchmark Comparison of GPT-4o, GPT-o1 Preview and GPT-o1 (to be released) in the Fields of Math, Coding and Science

Source: OpenAI

The chart above shows how o1, compared to GPT-4o, elevates the potential of LLM reasoning from a merely acceptable level to an outstanding level. Without specialized training, it can achieve gold-medal results in mathematical Olympiads and 89 percentiles in coding. In PhD-level science Q&A tests, it even surpasses human experts.

What makes o1 a breakthrough?

  • Adding Reinforcement Learning to LLM training

Reinforcement Learning (RL) trains AI through a cycle of rewards and consequences. By exploring various actions, and learning from the outcomes via a reward system, AI adjusts its behavior to optimize results. This process naturally creates a data flywheel, continuously generating training datasets with both positive and negative feedback, further refining the AI’s performance. A notable example is the essential role of RL in AlphaGo’s training process. This method significantly improves the reliability and accuracy of LLM outputs.

  • Hidden chain of thoughts (CoT) in ‘thinking’ process

One of the key challenges with LLMs is their tendency to hallucinate, often criticized as “stochastic parroting”. To address this, OpenAI’s o1 model employs a structured, step-by-step reasoning process, akin to deliberate human thought. By breaking complex tasks into simpler components, it improves accuracy and problem-solving efficiency. If one approach fails, the model tries alternative strategies, which enhances its overall reasoning capabilities and versatility.

  • Scaling law on inference phase

The chart below shows how different o1’s (or “Strawberry’s”) inference time compute compares to most LLMs. We often talk about the scaling law – how model performance improves as one increases certain key parameters like model size (# parameters), dataset size and compute power. In this case, OpenAI put more compute power weight in o1’s inference (thinking) stage.

Inference Time Compute Allocation: o1 vs Most LLMs

Sources: Jam Fan, NVIDIA

Therefore, with more time spent thinking, the scaling law still holds up as the reasoning capabilities from o1 significantly improve.

What are the implications?

  • With o1’s advanced reasoning competitive edge, the overly crowded LLM landscape will shift in OpenAI’s favor. Such a lead will last till someone else starts deploying similar architecture and techniques. The chart below shows how o1 outperforms its peers in IQ tests. Such a technological breakthrough will also help OpenAI’s current funding efforts at a valuation of $150 billion.

Leading LLMs’ Performance Comparison

Source: Maxim Lott on X, Mensa
  • The advancements of o1 are poised to rapidly accelerate the deployment of LLMs in industries that rely heavily on complex reasoning, such as STEM education, legal services, scientific discovery, and research. In STEM education, o1’s enhanced reasoning abilities will enable the creation of adaptive learning platforms that can guide students through intricate problems, fostering a deeper understanding of advanced subjects like mathematics, physics and engineering. In the legal field, o1 can help reason during legal research and case analysis. In science, o1 will expedite the analysis of data and literature, uncovering new insights and driving innovation across disciplines.
  • OpenAI’s o1 has excelled in complex coding tasks, setting the stage for more advanced AI agents and streamlined agentic workflows. As reasoning is a key component of AI agents, o1’s superior reasoning abilities enable them to efficiently break down difficult coding challenges into manageable steps and discover optimal solutions. For instance, Devin, the popular AI agent in coding, is found more apt in accurately diagnosing the underlying causes of issues in lengthy codes when it is powered by o1.

OpenAI o1 Improves Coding Agent’s Performance Compared to GPT-4o

Source: Cognition AI
  • Humanoids and robotics will benefit from the OpenAI o1 model due to its reasoning abilities. Through enhanced CoT processing, o1 enables robots to exercise complex decisions with greater accuracy, allowing them to tackle tasks that demand deeper, multi-step reasoning. This is particularly impactful in scenarios where robots must navigate dynamic environments or carry out intricate tasks.
  • OpenAI o1 demonstrates that scaling compute can be achieved through two main avenues – increasing power during training, and inference. While greater inference compute time leads to higher reasoning capability, we foresee a shift in computing emphasis from training to inference, representing a fundamental change in how AI approaches different task executions.
  • This opens a significant opportunity for NVIDIA’s competitors, like Groq and SambaNova Systems, which are better positioned to compete in the inference compute market, where their chances of success are much stronger than in the training space.
  • However, the o1 model’s demand for more power during inference leads to longer processing times, a direct consequence of its advanced reasoning capabilities. This highlights the need for a careful balance between speed and precision during the engineering stage, ensuring the model is applied in scenarios where its strengths, such as detailed reasoning, outweigh the slower response times.
  • Moreover, the o1 model also has a notably higher cost-performance ratio of 3-4, making it more expensive than ChatGPT-4o. Therefore, for applications that prioritize speed, such as customer service or simple data analysis, the increased cost and slower response time may outweigh its advanced reasoning benefit.
Wei is a senior consultant in Counterpoint specializing in Artificial Intelligence. She is also the China founder of Humanity+, an international non-profit organization which advocates the ethical use of emerging technologies. She formerly served as a product manager of Embedded Industrial PC at Advantech. Before that she was an MBA consultant to Nuance Communications where her team successfully developed and launched Nuance’s first B2C voice recognition app on iPhone (later became Siri). Wei’s early years in the industry were spent in IDC’s Massachusetts headquarters and The World Bank’s DC headquarters.

Term of Use and Privacy Policy

Counterpoint Technology Market Research Limited

Registration

In order to access Counterpoint Technology Market Research Limited (Company or We hereafter) Web sites, you may be asked to complete a registration form. You are required to provide contact information which is used to enhance the user experience and determine whether you are a paid subscriber or not.
Personal Information When you register on we ask you for personal information. We use this information to provide you with the best advice and highest-quality service as well as with offers that we think are relevant to you. We may also contact you regarding a Web site problem or other customer service-related issues. We do not sell, share or rent personal information about you collected on Company Web sites.

How to unsubscribe and Termination

You may request to terminate your account or unsubscribe to any email subscriptions or mailing lists at any time. In accessing and using this Website, User agrees to comply with all applicable laws and agrees not to take any action that would compromise the security or viability of this Website. The Company may terminate User’s access to this Website at any time for any reason. The terms hereunder regarding Accuracy of Information and Third Party Rights shall survive termination.

Website Content and Copyright

This Website is the property of Counterpoint and is protected by international copyright law and conventions. We grant users the right to access and use the Website, so long as such use is for internal information purposes, and User does not alter, copy, disseminate, redistribute or republish any content or feature of this Website. User acknowledges that access to and use of this Website is subject to these TERMS OF USE and any expanded access or use must be approved in writing by the Company.
– Passwords are for user’s individual use
– Passwords may not be shared with others
– Users may not store documents in shared folders.
– Users may not redistribute documents to non-users unless otherwise stated in their contract terms.

Changes or Updates to the Website

The Company reserves the right to change, update or discontinue any aspect of this Website at any time without notice. Your continued use of the Website after any such change constitutes your agreement to these TERMS OF USE, as modified.
Accuracy of Information: While the information contained on this Website has been obtained from sources believed to be reliable, We disclaims all warranties as to the accuracy, completeness or adequacy of such information. User assumes sole responsibility for the use it makes of this Website to achieve his/her intended results.

Third Party Links: This Website may contain links to other third party websites, which are provided as additional resources for the convenience of Users. We do not endorse, sponsor or accept any responsibility for these third party websites, User agrees to direct any concerns relating to these third party websites to the relevant website administrator.

Cookies and Tracking

We may monitor how you use our Web sites. It is used solely for purposes of enabling us to provide you with a personalized Web site experience.
This data may also be used in the aggregate, to identify appropriate product offerings and subscription plans.
Cookies may be set in order to identify you and determine your access privileges. Cookies are simply identifiers. You have the ability to delete cookie files from your hard disk drive.