Positional Embeddings are vectors added to Token Embeddings to provide the model with information about the order or position of tokens in a sequence.
Necessity
Token embeddings alone capture semantic meaning (e.g., “cat” vs “dog”) but ignore position.
- Example: “The cat sits on the mat” has a different meaning if the words are jumbled, but the sum of their static embeddings would clearly not capture the structure.
Types of Positional Embeddings
There are two main approaches to encoding position:
- Absolute Positional Embedding: Assigns a unique vector to each position (used in GPT-2, GPT-3).
- Relative Positional Embedding: Encodes the relative distance between tokens (useful for long sequences).
