A Context Vector is the output of the Attention Mechanism. Unlike an input embedding vector, which only captures the semantic meaning of a single token in isolation, a context vector is “enriched” because it incorporates information about how that token relates to all other tokens in the input sequence.
Calculation
The context vector () is calculated as the weighted sum of Value vectors (), weighted by the Attention Weights ().
For example, in the sentence “Your journey starts with one step,” the context vector for “journey” is the weighted sum of the value vectors for all tokens in the sequence (“Your”, “starts”, “with”, “one”, “step”).
If “journey” pays high attention to “starts” and “step,” their value vectors will contribute significantly to the final context vector , enriching “journey” with the specific context of beginning a process.
