Vector Embedding

Table of Contents

A Vector Embedding corresponds to representing a word or token as a vector of numbers in a high-dimensional space. This allows mathematical operations to reflect semantic meaning.

Concept

If we imagine a vector where dimensions correspond to features like “has a tail”, “is eatable”, or “is a pet”, words with similar properties will have similar vector representations.

Semantic Properties

Well-trained vector embeddings exhibit remarkable properties:

  1. Similarity: The magnitude of the difference between vectors indicates semantic distance. “Man” and “Woman” are closer than “Semiconductor” and “Earthworm”.
  2. Arithmetic: You can perform operations like King + Woman - Man. The resulting vector is closest to Queen.

Role in LLMs

In the context of Large Language Models, Token Embeddings typically represent the third step in the workflow, following tokenization and the conversion of tokens to token IDs.

Comparison with One-Hot Encoding

Unlike simpler methods like One-Hot Encoding or random number assignment, embeddings convert individual tokens into continuous vector representations that capture semantic meaning.

Why it Matters

    Mike 3.0

    Send a message to start the chat!

    You can ask the bot anything about me and it will help to find the relevant information!

    Try asking: