Pre-training

Table of Contents

Pre-training

Pre-training is a technique in machine learning, specifically a form of Transfer Learning, where a model is first trained on a large dataset to learn general features and patterns before being adapted (or fine-tuned) for a specific downstream task.

The core intuition is that it is easier to solve a specific problem if you already have a general understanding of the domain.

Core Mechanism

The workflow typically consists of two stages:

  1. Pre-training: The model is trained on a massive amount of generic data (e.g., the entire internet, ImageNet) to learn broad representations. This is often the most computationally expensive phase.
  2. Fine-tuning: The pre-trained “base model” is then updated using a smaller, task-specific dataset to specialize its performance.

Types of Pre-training

1. Unsupervised / Self-Supervised

This is the dominant paradigm for Large Language Model. The model learns from the internal structure of the data without explicit labels.

2. Supervised

Common in older Computer Vision workflows (e.g., ResNet).

Role in Large Language Models

In the context of LLMs, Pre-training is the phase where the model gains its “intelligence” or base capabilities.

    Mike 3.0

    Send a message to start the chat!

    You can ask the bot anything about me and it will help to find the relevant information!

    Try asking: