Supervised Fine-Tuning (SFT)

Supervised Fine-Tuning (SFT), often called Instruction Tuning, is the process of training a pre-trained Large Language Model on a smaller, high-quality dataset of labeled examples (typically pairs of Instruction $\rightarrow$ Response) to adapted it for a specific task or behavior.

Purpose

Instruction Following: While Pre-training teaches a model to predict the next word (probabilistic completion), SFT teaches it to follow user commands and act as a helpful assistant.
Domain Adaptation: Adapting a general model to a specific domain (e.g., medical, legal, coding) by feeding it domain-specific Question-Answer pairs.

Process

Input: A pre-trained Foundational Model (bases model).
Data: A dataset of prompts (inputs) and desired completions (labels).
- Example: {"prompt": "Summarize this article...", "completion": "The article discusses..."}
Training: The model parameters are updated to minimize the difference between its generated output and the labeled completion.

Supervised Fine-Tuning (SFT)

Purpose

Process

Chat with Mike 3.0