1. LLM Basics

  • LLMs predict the next token based on context.
  • They operate over tokens, not words.
  • Built on the Transformer architecture (self-attention mechanism).
  • Decoder-only transformers are used for models like GPT, Codex, and ChatGPT.
  • Self-attention allows the model to consider all previous tokens at once, enabling contextual understanding.

2. How Reasoning Emerges

  • Although LLMs predict tokens, they predict the next token based on the most likely reasoning pattern they learned during training.
  • The model has seen millions of examples of step-by-step reasoning, explanations, and logic chains.
  • Reasoning is an emergent behavior from large-scale pattern learning, not explicit logic execution.
  • Transformers allow relationships between distant parts of the context, enabling logical flow.

3. Chain-of-Thought (CoT)

  • Internally, LLMs often generate a chain of reasoning before giving the final answer.
  • In modern models, this chain-of-thought is hidden for safety and reliability.
  • Older models (like Codex) exposed chain-of-thought openly if prompted.

4. How Reasoning Models Work (o-series)

  • They generate internal reasoning steps (scratchpad), run loops over them, refine them, and feed them back internally.
  • Only the final summarized answer is shown to the user.
  • “Analyzing…” is just UI; real reasoning happens internally.

5. Codex vs Modern GPT Models

  • Codex (2021) is older and based on GPT-3.
  • GPT-4, GPT-5, and reasoning models are far more advanced, with improved alignment and internal reasoning loops.
  • Codex followed prompts literally and showed what it was “thinking”.
  • Newer models hide reasoning and are trained to produce only the final answer.

6. Why Codex Exposed Its Thoughts

  • Codex lacked instruction tuning and safety layers.
  • Not trained to hide chain-of-thought.
  • Behaved like a raw model completing text directly, including thinking steps.

7. Why Modern GPT Models Hide Chain-of-Thought

  • To avoid misleading or inaccurate reasoning steps.
  • To prevent training data exposure.
  • To ensure safer and more reliable outputs.
  • Chain-of-thought is summarized into a concise, user-facing explanation.

8. Modern Reasoning Model Architecture (Simplified)

  • Generates internal reasoning tokens.
  • Feeds them back into itself (deliberate reasoning).
  • Runs multiple forward passes or branches (tree-of-thought).
  • Produces final distilled answer without exposing internal logic.

9. Timeline Summary

  • GPT-2 (2019)
  • GPT-3 (2020)
  • Codex (2021) – fine-tuned GPT-3 for code.
  • GPT-3.5 / ChatGPT (2022)
  • GPT-4 (2023)
  • GPT-4 Turbo (late 2023)
  • GPT-5 and GPT-5.1 (2024–2025)
  • o-series reasoning models (2024–2025)