1. LLM Basics
- LLMs predict the next token based on context.
- They operate over tokens, not words.
- Built on the Transformer architecture (self-attention mechanism).
- Decoder-only transformers are used for models like GPT, Codex, and ChatGPT.
- Self-attention allows the model to consider all previous tokens at once, enabling contextual understanding.
2. How Reasoning Emerges
- Although LLMs predict tokens, they predict the next token based on the most likely reasoning pattern they learned during training.
- The model has seen millions of examples of step-by-step reasoning, explanations, and logic chains.
- Reasoning is an emergent behavior from large-scale pattern learning, not explicit logic execution.
- Transformers allow relationships between distant parts of the context, enabling logical flow.
3. Chain-of-Thought (CoT)
- Internally, LLMs often generate a chain of reasoning before giving the final answer.
- In modern models, this chain-of-thought is hidden for safety and reliability.
- Older models (like Codex) exposed chain-of-thought openly if prompted.
4. How Reasoning Models Work (o-series)
- They generate internal reasoning steps (scratchpad), run loops over them, refine them, and feed them back internally.
- Only the final summarized answer is shown to the user.
- “Analyzing…” is just UI; real reasoning happens internally.
5. Codex vs Modern GPT Models
- Codex (2021) is older and based on GPT-3.
- GPT-4, GPT-5, and reasoning models are far more advanced, with improved alignment and internal reasoning loops.
- Codex followed prompts literally and showed what it was “thinking”.
- Newer models hide reasoning and are trained to produce only the final answer.
6. Why Codex Exposed Its Thoughts
- Codex lacked instruction tuning and safety layers.
- Not trained to hide chain-of-thought.
- Behaved like a raw model completing text directly, including thinking steps.
7. Why Modern GPT Models Hide Chain-of-Thought
- To avoid misleading or inaccurate reasoning steps.
- To prevent training data exposure.
- To ensure safer and more reliable outputs.
- Chain-of-thought is summarized into a concise, user-facing explanation.
8. Modern Reasoning Model Architecture (Simplified)
- Generates internal reasoning tokens.
- Feeds them back into itself (deliberate reasoning).
- Runs multiple forward passes or branches (tree-of-thought).
- Produces final distilled answer without exposing internal logic.
9. Timeline Summary
- GPT-2 (2019)
- GPT-3 (2020)
- Codex (2021) – fine-tuned GPT-3 for code.
- GPT-3.5 / ChatGPT (2022)
- GPT-4 (2023)
- GPT-4 Turbo (late 2023)
- GPT-5 and GPT-5.1 (2024–2025)
- o-series reasoning models (2024–2025)