OpenAI rolled out its new o1 models Thursday. But first, a explanation for ChatGPT users: these AI models pause to "think" before they answer. There's been much hype building to these models, codenamed "Strawberry" inside OpenAI. Compared to GPT-4o, the o1 models feel like a step forward and two steps back. OpenAI o1 is pretty good at reasoning and other complex question-answering tasks, but it costs roughly four times as much to run as GPT-4o. This latest model from OpenAI completely lacks the tools, multimodal capabilities, and speed that made GPT4o so impressive. OpenAI even admits on its help page that "GPT-4o is still the best option for most prompts" and elsewhere says o1 struggles at simpler tasks.
"It's impressive, but I think the improvement is not very significant," said Ravid Shwartz Ziv, who teaches at NYU and studies AI models. "It's better at certain problems, but you don't have this across-the-board improvement."
OpenAI o1 is unusual in that it "thinks" before answering, breaking down big problems into small steps and trying to figure out when it gets one of those steps right or wrong. "Multi-step reasoning" isn't a new idea ; researchers have proposed it for years, and You.com uses it for complex queries, but it hadn't been practical until recently.
There's quite a bit of excitement in the AI community, says Work era CEO and Stanford adjunct lecturer Kian Katan foroosh, who teaches classes on machine learning, in an interview: If you can train a reinforcement learning algorithm paired with some of the language model techniques that OpenAI has, you can technically create step-by-step thinking and allow the AI model to walk backwards from big ideas you're trying to work through.
OpenAI o1 is also quite expensive. In most of the models, you pay for input tokens and output tokens. However, o1 has added on an indirect process wherein the model deconstructs the big problems into smaller steps, resulting in a very large amount of compute that you don't see all along. Some of the details of this process are being hidden by OpenAI for competitive advantage. That said, you still pay for these in the guise of "reasoning tokens." This makes further clear why you want to be wary when using OpenAI o1 so that you don't end up paying a gazillion tokens for asking where the capital of Nevada is.
As an illustration, I asked ChatGPT o1 to preview helping my family plan Thanks giving; perhaps some unbiased logic and reason might make that little bit brighter. Specifically, I was looking to get help in figuring out whether two ovens would be enough to cook Thanks giving dinner for 11 people and wanted to walk through whether we should consider renting an Airbnb to get access to a third oven.\
After "thinking" for 12 seconds, ChatGPT wrote me a 750+ word response concluding that two ovens will indeed be enough with some clever planning, and they will actually help my family in the long run cut down on costs and spend their quality time together. It however broke it down for me step-by-step and told how it had taken all of these external factors into account, including the costs, the family times spent together, and how to manage the oven.