Supervised Fine-Tuning (SFT), often called Instruction Tuning, is the process of training a pre-trained Large Language Model on a smaller, high-quality dataset of labeled examples (typically pairs of Instruction Response) to adapted it for a specific task or behavior.
Purpose
- Instruction Following: While Pre-training teaches a model to predict the next word (probabilistic completion), SFT teaches it to follow user commands and act as a helpful assistant.
- Domain Adaptation: Adapting a general model to a specific domain (e.g., medical, legal, coding) by feeding it domain-specific Question-Answer pairs.
Process
- Input: A pre-trained Foundational Model (bases model).
- Data: A dataset of prompts (inputs) and desired completions (labels).
- Example:
{"prompt": "Summarize this article...", "completion": "The article discusses..."}
- Example:
- Training: The model parameters are updated to minimize the difference between its generated output and the labeled completion.
