Absolute Positional Embedding is a type of Positional Embedding where a unique embedding vector is assigned to each specific position in the input sequence. This vector is added to the Token Embeddings to convey the exact location of each token.
Key Concepts
- Mechanism: For each position in the input sequence (e.g., position 1, position 2), a unique embedding is added.
- Example: In “The cat sat on the mat” (Sentence 1) and “On the mat the cat sat” (Sentence 2), the word “cat” has the same token embedding . However, in Sentence 1, it might have a positional embedding (position 2), resulting in . In Sentence 2, it might have positional embedding (position 5), resulting in .
- Structure: The positional vectors must have the same dimension as the token embeddings to allow for element-wise addition.
- Usage: Commonly used when fixed order is crucial, such as in sequence generation. GPT-3 and GPT-4 use absolute positional embeddings that are optimized during training.
- Learned vs. Fixed: While the original Attention Is All You Need paper used a fixed sinusoidal formula, GPT models learn these embeddings as part of the training process.
