Attention Mechanism

Purpose: It gives the LLM selective access to the whole input sequence, enabling it to focus on relevant context from previous sentences rather than just the immediate neighbors.
Function: It computes a weight (attention score) for each word in the context, determining how much influence it should have on the prediction of the next word .
Components: It relies on vectors known as Query, Key, Value (QKV) to calculate these interactions.

The Attention Mechanism is a fundamental concept in Deep Learning and LLMs that allows a model to weigh the importance of different parts of the input data when generating an output. Key component of the Transformer architecture.