A Theory of Learning with Autoregressive Chain of Thought

CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

Auto-Regressive Next-Token Predictors are Universal Learners