Parameter-Efficient Fine-Tuning (PEFT)

Parameter-Efficient Fine-Tuning (PEFT) is a family of techniques used to fine-tune large pre-trained models by updating only a small subset of parameters (or adding a small number of new trainable parameters), while keeping the vast majority of the original pre-trained weights frozen.

Core Problem Solved

Full Fine-Tuning of large models (e.g., 70B parameters) is extremely expensive:

Compute: Requires massive GPU processing power to calculate gradients for all weights.
Memory: Requires storing optimizer states for every single parameter (often 3-4x the model size).
Storage: Producing a separate full copy of the model for every downstream task is inefficient.

Key Techniques

Low-Rank Adaptation (LoRA): Injects small, trainable rank-decomposition matrices into linear layers.
Adapters: Inserts small trainable neural network layers between existing frozen layers.
Prompt Tuning: Adds trainable “virtual tokens” to the input prompt, leaving the model weights entirely untouched.
Quantized Low-Rank Adaptation (QLoRA): Combines LoRA with aggressive quantization (4-bit) to further reduce memory usage.

Benefits

Reduced Hardware Requirements: Enables fine-tuning massive models on consumer hardware (e.g., a single GPU).
Modularity: You can have one widely shared “Base Model” and swap small (mb-sized) “Adapter” files for different tasks (e.g., one adapter for coding, one for creative writing).
Less Catastrophic Forgetting: Since most weights are frozen, the model is less likely to forget its pre-trained general knowledge.

Parameter-Efficient Fine-Tuning (PEFT)

Core Problem Solved

Key Techniques

Benefits

Chat with Mike 3.0