A Theory of Learning with Autoregressive Chain of Thought
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision
Auto-Regressive Next-Token Predictors are Universal Learners