Broadcasting (Tensor Operation)

Table of Contents

Broadcasting is a mechanism in libraries like PyTorch and NumPy that allows operations on tensors of different shapes by automatically expanding their dimensions to match.

Context in LLMs

In the context of creating Input Embeddings, broadcasting is used to add Positional Embeddings (which typically have a smaller shape corresponding to the Context Size) to the batched Token Embeddings.

Example Calculation

Application in Softmax

In the coding implementation of the Softmax Activation Function, broadcasting is critical for operations like subtracting the maximum value or dividing by the sum.

    Mike 3.0

    Send a message to start the chat!

    You can ask the bot anything about me and it will help to find the relevant information!

    Try asking: