Input Embeddings are the final vector representations fed into the Transformer Block of a LLM. They are formed by adding Token Embeddings and Positional Embeddings together element-wise.
The data pre-processing pipeline aims to produce these input embeddings from raw text, ensuring that both the semantic meaning (from token embeddings) and the positional information (from positional embeddings) are captured before being processed by the model layers.
Tensor Dimensions
In a practical efficient implementation (like for GPT-2):
- Resulting Shape: If we have a Batch Size of 8, a Context Length of 4, and an embedding dimension of 256, the final Input Embeddings tensor will have the shape
[8, 4, 256]. - This shape is derived from adding the
[8, 4, 256]token embedding tensor and the[4, 256]positional embedding tensor (via Broadcasting).
