Vector Embeddings
A vector embedding is a numerical representation of data (text, images, audio) as a list of numbers (a vector) in a high-dimensional space. Machine learning models use embeddings to understand and process complex data by converting it into a format where mathematical operations can reveal relationships, similarities, and patterns.1
Why Embeddings Matter
Traditional approaches represent words as one-hot vectors where each word is a sparse vector with a single 1 and all other positions as 0. This approach has two major problems:
| Approach | Dimensionality | Semantic Similarity | Sparsity |
|---|---|---|---|
| One-hot encoding | Vocabulary size (e.g., 50,000) | None captured | Extremely sparse |
| Dense embeddings | 100-1024 dimensions | Captured | Dense |
Dense embeddings solve both problems by representing words in a lower-dimensional space where semantically similar words are close together.
Key Properties of Embeddings
Semantic Similarity
Words or concepts with similar meanings have similar vector representations. The distance between vectors (typically measured by cosine similarity) reflects semantic relatedness:
$$ \text{similarity}(A, B) = \cos(\theta) = \frac{A \cdot B}{|A| |B|} $$
Analogical Reasoning
Good embeddings capture relationships. The famous example:
$$ \text{king} - \text{man} + \text{woman} \approx \text{queen} $$
This works because the vector difference between “king” and “man” captures the concept of royalty, which when added to “woman” yields “queen”.
Types of Embeddings
Static Embeddings
Each word has a single fixed vector representation regardless of context.
| Method | Approach | Training Objective |
|---|---|---|
| Word2Vec | Neural network | Predict surrounding words (CBOW) or word from context (Skip-gram) |
| GloVe | Matrix factorization | Model word co-occurrence statistics |
| FastText | Neural network | Like Word2Vec but includes subword information |
Limitation: “Bank” has the same embedding whether referring to a financial institution or a river bank.
Contextual Embeddings
Each word’s representation depends on its surrounding context.
| Model | Architecture | Key Innovation |
|---|---|---|
| BERT | Transformer encoder | Bidirectional context, masked language modeling |
| GPT | Transformer decoder | Autoregressive, left-to-right context |
| RoBERTa | Transformer encoder | Improved BERT training |
Advantage: “Bank” gets different embeddings in “I deposited money at the bank” vs “I sat by the river bank”.
Embedding Dimensions
Common embedding sizes:
| Model | Dimensions | Use Case |
|---|---|---|
| Word2Vec | 100-300 | Traditional NLP |
| GloVe | 50-300 | Traditional NLP |
| BERT-base | 768 | General purpose |
| BERT-large | 1024 | Higher capacity |
| OpenAI ada-002 | 1536 | Production applications |
| OpenAI text-embedding-3-large | 3072 | High-precision retrieval |
Higher dimensions capture more nuance but require more storage and computation.
Applications
| Application | How Embeddings Are Used |
|---|---|
| Semantic Search | Query and document embeddings compared for relevance |
| Recommendation systems | User and item embeddings for similarity matching |
| Clustering | Group similar documents or entities |
| Classification | Input features for ML classifiers |
| Anomaly detection | Identify outliers in embedding space |
Creating Embeddings
Pre-trained Models
For most applications, use pre-trained embedding models:
# Using sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(["Hello world", "Hi there"])
Fine-tuning
For domain-specific applications, fine-tune embeddings on your data to improve performance on specialized vocabulary and concepts.
Related Topics
- Word2Vec - Original neural network approach to word embeddings
- GloVe - Global vectors from co-occurrence statistics
- BERT - Contextual embeddings from Transformers
- Semantic Search - Using embeddings for retrieval
References
Mikolov, T., et al. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781. https://arxiv.org/abs/1301.3781 ↩︎