Broadcasting (Tensor Operation)

Broadcasting is a mechanism in libraries like PyTorch and NumPy that allows operations on tensors of different shapes by automatically expanding their dimensions to match.

Context in LLMs

In the context of creating Input Embeddings, broadcasting is used to add Positional Embeddings (which typically have a smaller shape corresponding to the Context Size) to the batched Token Embeddings.

Example Calculation

Token Embeddings Shape: [Batch Size, Context Length, Embedding Dim] = 8 x 4 x 256
Positional Embeddings Shape: [Context Length, Embedding Dim] = 4 x 256
Operation: When adding these two tensors, PyTorch automatically “broadcasts” the 4 x 256 positional matrix across the 8 batches of the token embeddings.
- Effectively, the same 4 x 256 positional vectors are added to each of the 8 samples in the batch.

Application in Softmax

In the coding implementation of the Softmax Activation Function, broadcasting is critical for operations like subtracting the maximum value or dividing by the sum.

Keep Dimensions: When calculating the max or sum along a specific axis (e.g., axis=1 for rows), it is crucial to use keepdims=True. This retains the dimensions (e.g., changing shape from (3, 3) to (3, 1) instead of reducing it to (3,)), allowing the result to be correctly broadcast back against the original matrix for element-wise subtraction or division.

Broadcasting (Tensor Operation)

Context in LLMs

Example Calculation

Application in Softmax

Chat with Mike 3.0