
1. What is a Vector?
vector is an ordered list of numbers.
In AI, it represents data in numeric form so that algorithms can compute.
Example:
- RGB color →
[255, 120, 90] - Word “king” →
[0.12, -0.98, 0.45, ...] - Image embedding → 512–4096 numbers
- Sentence, user profile, product → vector
2. Why Vectors Matter in AI
AI models cannot understand raw text/images, so they convert them into vectors.
Vectors allow models to do:
- Similarity search
- Classification
- Clustering
- Recommendation
- Reasoning (via embedding spaces)
- Memory retrieval (RAG)
3. Vector Dimensions (d-dimensional vectors)
A vector has a dimension (size).
Example:
- Word2Vec → 300-dim
- Sentence embeddings → 768-dim
- CLIP image embeddings → 512-dim
- GPT embeddings→ 1536, 3072, 4096+ dims
Higher dimension → more info encoded.
4. Types of Vectors in AI
a) Dense Vectors
Most common. All values present.
Used in: embeddings, deep learning.
Example: [0.23, -0.44, 0.91]
b) Sparse Vectors
Most values are 0.
Used in older ML (TF-IDF, bag-of-words).
Example: [0, 0, 34, 0, 0, 1, ...]
c) One-Hot Vectors
Only one “1”, everything else “0”.
Example:
word “cat”: [0,0,1,0,0]
Not meaningful → replaced by embeddings.
5. Vector Operations (Core of ML)

a) Addition
Combines features.
Used in residual networks, positional encoding.
b) Subtraction
Shows relationships.
Example from Word2Vec:
king - man + woman = queen
c) Scalar Multiplication
Controls intensity/magnitude.
d) Dot Product
Core operation in attention, similarity, projections.
Dot product =
-
high → similar direction
-
low → unrelated
-
negative → opposite meaning
e) Norm (Length of a vector)
Distance from origin.
Used in normalization, regularization.
f) Normalization (Unit Vector)
Vector scaled to length 1.
Essential for cosine similarity.
6. Measuring Similarity Between Vectors
Most important concept in embeddings.
a) Euclidean Distance
Straight-line distance.
Good for clustering.
b) Cosine Similarity
Angle between two vectors.
Best for text and embeddings.
cosine = 1 → identical
cosine = 0 → unrelated
cosine = -1 → opposite
c) Dot Product Similarity
Used inside transformers.
7. Embeddings (Vectors that represent meaning)
An embedding is a vector that captures semantics.
Examples:
-
GPT embeddings → text meaning
-
CLIP embeddings → image+text joint space
-
User click history → preference vector
-
Recommendation engines → item vectors
Semantically similar items → nearby in vector space.
8. Vector Space
The environment in which vectors live.
Properties:
-
Has dimensions
-
Has directions
-
Distances define similarity
-
Clusters form naturally
Example:
Words related to countries cluster together in a subspace.
9. High-Dimensional Space Intuition
AI vectors live in 100s–1000s of dimensions.
Properties:
-
Points spread out → reduces collisions
-
Similar meanings form local clusters
-
Distances become stable (concentration phenomena)
This is why embeddings are powerful.
10. Vector Databases (Used in RAG)
Store embeddings for fast similarity search.
Examples:
-
Pinecone
-
Weaviate
-
Milvus
-
PGVector
-
Redis Vector
They use ANN (approx nearest neighbor) indexing:
-
HNSW
-
IVF
-
PQ / OPQ
Vector DB retrieves relevant memory for LLMs.
11. Attention = Vector Similarity
Transformers calculate:
Attention = softmax(Q ⋅ Kᵀ / √d) ⋅ V
Where:
-
Q (query) → what we search
-
K (key) → memory
-
V (value) → information retrieved
Dot-product similarity drives the entire mechanism.
12. Vectorization of Everything
Modern AI converts everything to vectors:
-
Text
-
Images
-
Audio
-
Video
-
User behavior
-
Database rows
-
Logs
-
Code
-
Graphs
This enables unified reasoning.
13. Common Vector Failures
-
High dimensional noise
-
Poorly trained embeddings
-
Inconsistent embedding models
-
Unnormalized vectors
-
Wrong similarity metric
-
Mixed domains (text + images without alignment)
14. Applications
-
Search (“semantic search”)
-
RAG (retrieval)
-
Recommendations
-
Anomaly detection
-
Fraud detection
-
Clustering
-
Classification
-
Chatbot memory
-
Multi-modal AI (e.g., CLIP)