202412181141
Status: #idea
Tags: #ai #deep_learning #transformers #computer_science
# Many forget that the Bitter Lesson includes both search AND deep learning
Richard Sutton's "The Bitter Lesson", a cult classic in the field of AI, emphasizes the importance of using methods that scale with more data and compute. He asserts that expert-designed systems are appealing (both from an aesthetic perspective and due to their ability to eke out SOTA performance in the short run), but that expert-designed systems are fundamentally limited in their ability to scale up in response to new data and more compute. Since both data and compute have been increasing on an exponential trend, over a timeline greater than 2-3 years, the better-scaling methods will start to outperform the expert-designed system.
Many take this article as a justification of deep learning, but they forget that Sutton specifically calls out deep learning AND search. These two methods scale up to improved performance with additional data and compute availability. New approaches like OpenAI's o1 models are the end result of taking the Bitter Lesson seriously - combining search and deep learning so that we get both scalable model paradigms in a single approach.
---
# References