📚 Vault

❯

❯

LLMs

Dec 05, 20252 min read

1. LLM Basics

LLMs predict the next token based on context.
They operate over tokens, not words.
Built on the Transformer architecture (self-attention mechanism).
Decoder-only transformers are used for models like GPT, Codex, and ChatGPT.
Self-attention allows the model to consider all previous tokens at once, enabling contextual understanding.

2. How Reasoning Emerges

Although LLMs predict tokens, they predict the next token based on the most likely reasoning pattern they learned during training.
The model has seen millions of examples of step-by-step reasoning, explanations, and logic chains.
Reasoning is an emergent behavior from large-scale pattern learning, not explicit logic execution.
Transformers allow relationships between distant parts of the context, enabling logical flow.

3. Chain-of-Thought (CoT)

Internally, LLMs often generate a chain of reasoning before giving the final answer.
In modern models, this chain-of-thought is hidden for safety and reliability.
Older models (like Codex) exposed chain-of-thought openly if prompted.

4. How Reasoning Models Work (o-series)

They generate internal reasoning steps (scratchpad), run loops over them, refine them, and feed them back internally.
Only the final summarized answer is shown to the user.
“Analyzing…” is just UI; real reasoning happens internally.

5. Codex vs Modern GPT Models

Codex (2021) is older and based on GPT-3.
GPT-4, GPT-5, and reasoning models are far more advanced, with improved alignment and internal reasoning loops.
Codex followed prompts literally and showed what it was “thinking”.
Newer models hide reasoning and are trained to produce only the final answer.

6. Why Codex Exposed Its Thoughts

Codex lacked instruction tuning and safety layers.
Not trained to hide chain-of-thought.
Behaved like a raw model completing text directly, including thinking steps.

7. Why Modern GPT Models Hide Chain-of-Thought

To avoid misleading or inaccurate reasoning steps.
To prevent training data exposure.
To ensure safer and more reliable outputs.
Chain-of-thought is summarized into a concise, user-facing explanation.

8. Modern Reasoning Model Architecture (Simplified)

Generates internal reasoning tokens.
Feeds them back into itself (deliberate reasoning).
Runs multiple forward passes or branches (tree-of-thought).
Produces final distilled answer without exposing internal logic.

9. Timeline Summary

GPT-2 (2019)
GPT-3 (2020)
Codex (2021) – fine-tuned GPT-3 for code.
GPT-3.5 / ChatGPT (2022)
GPT-4 (2023)
GPT-4 Turbo (late 2023)
GPT-5 and GPT-5.1 (2024–2025)
o-series reasoning models (2024–2025)

Graph View

1. LLM Basics
2. How Reasoning Emerges
3. Chain-of-Thought (CoT)
4. How Reasoning Models Work (o-series)
5. Codex vs Modern GPT Models
6. Why Codex Exposed Its Thoughts
7. Why Modern GPT Models Hide Chain-of-Thought
8. Modern Reasoning Model Architecture (Simplified)
9. Timeline Summary

Backlinks

No backlinks found

Author - Adesh ❤️ © 2025

GitHub