The Attention Mechanism is a fundamental concept in Deep Learning and LLMs that allows a model to weigh the importance of different parts of the input data when generating an output. Key component of the Transformer architecture.
Key Concepts
- Purpose: It gives the LLM selective access to the whole input sequence, enabling it to focus on relevant context from previous sentences rather than just the immediate neighbors.
- Function: It computes a weight (attention score) for each word in the context, determining how much influence it should have on the prediction of the next word.
- Components: It relies on vectors known as Query, Key, Value (QKV) to calculate these interactions.
Types
- Self-Attention Mechanism: When the model looks at other positions within the same sequence to compute a representation of the sequence.
- Causal Attention: When the model looks at previous positions within the same sequence to compute a representation of the sequence.
- Multi-head Attention: Using multiple attention mechanisms in parallel to capture various types of relationships.
