实践： LoRA 实现与微调¶

⚠️ 时效性说明：本章涉及前沿模型/价格/榜单等信息，可能随版本快速变化；请以论文原文、官方发布页和 API 文档为准。

目标：从零实现 LoRA ，并在真实任务上进行微调。

项目结构¶

Text Only

实践-LoRA实现/
└── README.md              # 本文件（包含所有代码参考，见附录）

所有代码文件（lora.py, requirements.txt）的完整内容已包含在本文档的 附录：完整代码参考 部分。请先自己动手实现，遇到困难时再参考。

任务说明¶

任务 1 ：实现 LoRA 核心组件（预计 2-3 小时）¶

1.1 实现 LoRALayer¶

在 lora.py 中实现：

Python

class LoRALayer(nn.Module):
    """
    LoRA层的核心实现

    实现要点：
    1. 初始化低秩矩阵A和B
    2. 实现正确的初始化策略（A高斯，B零）
    3. 实现前向传播：BAx * scaling

    输入: [batch, seq_len, in_features]
    输出: [batch, seq_len, out_features]
    """
    pass

1.2 实现 LinearWithLoRA¶

Python

class LinearWithLoRA(nn.Module):
    """
    带LoRA的线性层

    实现要点：
    1. 冻结原始线性层的参数
    2. 添加LoRA层
    3. 前向传播：原始输出 + LoRA输出
    """
    pass

1.3 实现 LoRA 注入函数¶

Python

def inject_lora(model, target_modules, rank=16, lora_alpha=32):
    """
    将LoRA注入到模型的指定模块

    实现要点：
    1. 遍历模型的所有模块
    2. 找到目标模块（如q_proj, v_proj）
    3. 替换为LinearWithLoRA
    4. 确保原始权重被冻结

    Args:
        model: 预训练模型
        target_modules: 目标模块名列表
        rank: LoRA秩
        lora_alpha: LoRA alpha

    Returns:
        修改后的模型
    """
    pass

1.4 实现 LoRA 参数保存和加载¶

Python

def save_lora_weights(model, save_path):
    """
    只保存LoRA权重

    实现要点：
    1. 遍历所有参数
    2. 只保存requires_grad=True的参数
    3. 保存为safetensors或pt格式
    """
    pass

def load_lora_weights(model, load_path):
    """
    加载LoRA权重

    实现要点：
    1. 加载LoRA权重
    2. 加载到对应的参数中
    3. 验证形状匹配
    """
    pass

任务 2 ：实现 LoRA 训练流程（预计 2-3 小时）¶

2.1 准备数据集¶

我们使用一个简单的文本生成任务：

Python

# 示例数据集：问答对
dataset = [
    {"instruction": "解释什么是机器学习", "output": "机器学习是人工智能的一个分支..."},
    {"instruction": "什么是神经网络", "output": "神经网络是一种模拟生物神经网络的计算模型..."},
    # ...
]

2.2 实现训练循环¶

Python

def train_lora(model, train_dataloader, val_dataloader, config):
    """
    LoRA训练循环

    实现要点：
    1. 只优化requires_grad=True的参数
    2. 实现学习率warmup
    3. 记录训练和验证loss
    4. 保存最佳模型
    5. 定期评估生成质量

    Args:
        model: 带LoRA的模型
        train_dataloader: 训练数据
        val_dataloader: 验证数据
        config: 训练配置
    """
    pass

2.3 实现评估函数¶

Python

def evaluate_generation(model, val_data, tokenizer, device):
    """
    评估生成质量

    实现要点：
    1. 对验证集进行生成
    2. 计算perplexity
    3. 保存生成样例供人工检查
    """
    pass

任务 3 ：实现 LoRA 权重合并（预计 1 小时）¶

3.1 合并 LoRA 到基础模型¶

Python

def merge_lora_weights(model):
    """
    将LoRA权重合并到基础模型

    实现要点：
    1. 对每个LinearWithLoRA层：
       - 计算合并后的权重：W_merged = W_base + B @ A * scaling
       - 替换原始线性层
    2. 返回合并后的模型（不含LoRA）

    好处：
    - 推理速度更快（没有额外的矩阵乘法）
    - 模型体积更小（不需要存储LoRA参数）
    """
    pass

3.2 验证合并正确性¶

Python

def verify_merge(original_model, merged_model, test_input):
    """
    验证合并后的模型输出是否一致

    实现要点：
    1. 用相同的输入测试两个模型
    2. 比较输出是否一致（在数值误差范围内）
    """
    pass

实验任务¶

实验 1 ：对比全量微调和 LoRA¶

Python

# 任务：在相同数据集上比较
# 1. 全量微调（如果显存允许）
# 2. LoRA微调 (r=8, r=16, r=32)
# 3. 记录：
#    - 训练时间
#    - 显存占用
#    - 最终loss
#    - 生成质量（人工评估）
#    - 可训练参数量

实验 2 ：不同 target_modules 的影响¶

Python

# 任务：测试不同模块组合的效果
# 1. 只适配q_proj
# 2. 只适配v_proj
# 3. 适配q_proj + v_proj
# 4. 适配q_proj + k_proj + v_proj
# 5. 适配所有注意力层
# 6. 适配注意力层 + FFN层

# 记录每种配置的效果和参数量

实验 3 ：不同 rank 的影响¶

Python

# 任务：测试不同rank的效果
# rank取值: [1, 2, 4, 8, 16, 32, 64]

# 观察：
# - rank增加是否提升效果？
# - rank增加是否导致过拟合？
# - 最佳rank是多少？

使用示例¶

基础使用流程¶

Python

# 1. 加载预训练模型
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# 2. 注入LoRA
from lora import inject_lora

model = inject_lora(
    model,
    target_modules=["c_attn"],  # GPT-2的注意力层
    rank=16,
    lora_alpha=32
)

# 3. 打印可训练参数
print_trainable_parameters(model)

# 4. 训练
from train_lora import train_lora
train_lora(model, train_loader, val_loader, config)

# 5. 保存LoRA权重
from lora import save_lora_weights
save_lora_weights(model, "lora_weights.pt")

# 6. 合并权重（可选）
from merge_lora import merge_lora_weights
merged_model = merge_lora_weights(model)

加载并推理¶

Python

# 1. 加载基础模型
model = AutoModelForCausalLM.from_pretrained("gpt2")

# 2. 注入LoRA结构
model = inject_lora(model, target_modules=["c_attn"], rank=16, lora_alpha=32)

# 3. 加载LoRA权重
from lora import load_lora_weights
load_lora_weights(model, "lora_weights.pt")

# 4. 推理
model.eval()
with torch.no_grad():  # 禁用梯度计算，节省内存（推理时使用）
    output = model.generate(**inputs)

评估指标¶

定量指标¶

Python

# 1. Perplexity
# 越低越好，表示模型对文本的预测能力

# 2. 训练/验证Loss
# 观察是否过拟合

# 3. 可训练参数量
# LoRA的优势所在

定性评估¶

Python

# 1. 准备测试prompts
test_prompts = [
    "解释什么是深度学习",
    "列举三个人工智能的应用",
    "什么是Transformer架构",
]

# 2. 生成并人工评估
# - 相关性：生成内容是否与prompt相关
# - 流畅性：语言是否自然流畅
# - 准确性：事实是否正确
# - 多样性：不同prompt的生成是否有区分度

调试技巧¶

检查 LoRA 是否正确注入¶

Python

def check_lora_injection(model):
    """检查LoRA是否正确注入"""
    lora_layers = 0
    frozen_layers = 0
    trainable_layers = 0

    for name, module in model.named_modules():
        if isinstance(module, LinearWithLoRA):  # isinstance检查类型
            lora_layers += 1
            print(f"✓ LoRA层: {name}")
        elif isinstance(module, nn.Linear):
            if not any(p.requires_grad for p in module.parameters()):  # any()任一为True则返回True
                frozen_layers += 1
            else:
                trainable_layers += 1
                print(f"⚠ 未冻结的线性层: {name}")

    print(f"\nLoRA层数: {lora_layers}")
    print(f"冻结层数: {frozen_layers}")
    print(f"可训练层数: {trainable_layers}")

检查梯度流动¶

Python

def check_gradients(model):
    """检查梯度是否正确计算"""
    for name, param in model.named_parameters():
        if param.requires_grad:
            if param.grad is None:
                print(f"⚠ 没有梯度: {name}")
            else:
                grad_norm = param.grad.norm().item()
                print(f"✓ {name}: 梯度范数={grad_norm:.6f}")

可视化训练过程¶

Python

import matplotlib.pyplot as plt

def plot_training_history(history):
    """绘制训练历史"""
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))

    # Loss曲线
    axes[0].plot(history['train_loss'], label='Train')
    axes[0].plot(history['val_loss'], label='Val')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].legend()
    axes[0].set_title('Loss Curve')

    # 学习率曲线
    axes[1].plot(history['learning_rate'])
    axes[1].set_xlabel('Step')
    axes[1].set_ylabel('Learning Rate')
    axes[1].set_title('Learning Rate Schedule')

    plt.tight_layout()
    plt.savefig('training_history.png')

常见问题¶

Q1: Loss 不下降¶

可能原因： - LoRA 没有正确注入 - 学习率太小 - 原始权重没有被冻结 - 数据预处理有问题

解决方法： - 使用check_lora_injection检查 - 增大学习率（ LoRA 通常需要 1e-4 到 1e-3 ） - 打印可训练参数确认

Q2: 显存仍然不够¶

解决方法： - 减小 batch_size ，使用梯度累积 - 减小 rank （如从 16 降到 8 ） - 使用 FP16/BF16 混合精度 - 减少 target_modules 数量

Q3: 过拟合严重¶

解决方法： - 减小 rank - 增加 lora_dropout - 减少训练轮数 - 增加正则化

Q4: 合并后效果变差¶

可能原因： - 合并计算有误 - scaling 因子没有正确应用

解决方法： - 使用verify_merge验证 - 检查 scaling 计算

进阶挑战¶

挑战 1 ：实现 QLoRA¶

Python

# 使用bitsandbytes库实现4-bit量化 + LoRA
# 目标：在8GB显存上微调7B模型

挑战 2 ：多 LoRA 切换¶

Python

# 实现同时加载多个LoRA权重
# 支持运行时切换不同任务的LoRA

挑战 3 ： LoRA 与全量微调混合¶

Python

# 某些层用LoRA，某些层全量微调
# 例如：Embedding层全量微调，注意力层用LoRA

参考资源¶

开始吧¶

参考附录中的 lora.py 代码框架，开始实现你的第一个 LoRA 层。

记住：理解每一行代码的含义，不要机械复制。

祝你好运！🚀

附录：完整代码参考¶

⚠️ 警告：请先自己尝试实现，遇到困难时再参考以下代码。

依赖包（ requirements.txt ）¶

Text Only

# 基础依赖
torch>=2.0.0
numpy>=1.24.0

# 可视化
matplotlib>=3.7.0
seaborn>=0.12.0

# 进度条
tqdm>=4.65.0

# 可选：用于下载数据
requests>=2.31.0

LoRA 核心实现（ lora.py ）¶

Python

"""
LoRA核心实现

任务：从零实现LoRA的所有组件
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import math
from collections.abc import Callable

class LoRALayer(nn.Module):
    """
    LoRA层的核心实现

    练习: 实现以下功能
    1. 初始化低秩矩阵A和B
    2. A使用高斯初始化，B使用零初始化
    3. 实现前向传播：BAx * scaling

    输入: [batch, seq_len, in_features]
    输出: [batch, seq_len, out_features]
    """

    def __init__(self, in_features: int, out_features: int, rank: int = 16, lora_alpha: int = 1):
        super().__init__()  # super()调用父类方法

        self.in_features = in_features
        self.out_features = out_features
        self.rank = rank
        self.lora_alpha = lora_alpha

        # scaling = alpha / rank
        self.scaling = lora_alpha / rank

        # 练习: 初始化低秩矩阵A和B
        # A: [in_features, rank]
        # B: [rank, out_features]

        # 练习: A使用kaiming_uniform_初始化
        # 练习: B使用零初始化

        raise NotImplementedError("请实现LoRALayer的__init__方法")

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        前向传播

        Args:
            x: [batch, seq_len, in_features]
        Returns:
            output: [batch, seq_len, out_features]
        """
        # 练习: 实现LoRA前向传播
        # 1. x @ A -> [batch, seq_len, rank]
        # 2. @ B -> [batch, seq_len, out_features]
        # 3. * scaling

        raise NotImplementedError("请实现LoRALayer的forward方法")

class LinearWithLoRA(nn.Module):
    """
    带LoRA的线性层

    练习: 实现以下功能
    1. 接收一个现有的线性层
    2. 冻结原始线性层的参数
    3. 添加LoRA层
    4. 前向传播时：原始输出 + LoRA输出
    """

    def __init__(self, linear_layer: nn.Linear, rank: int = 16, lora_alpha: int = 1):
        super().__init__()

        # 练习: 保存原始线性层并冻结参数
        # 练习: 创建LoRA层

        raise NotImplementedError("请实现LinearWithLoRA的__init__方法")

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """前向传播：原始线性层输出 + LoRA输出"""
        raise NotImplementedError("请实现LinearWithLoRA的forward方法")

def inject_lora(model: nn.Module, target_modules: list[str], rank: int = 16, lora_alpha: int = 32) -> nn.Module:
    """
    将LoRA注入到模型的指定模块

    Args:
        model: 预训练模型
        target_modules: 目标模块名列表，如["q_proj", "v_proj"]
        rank: LoRA秩
        lora_alpha: LoRA alpha
    Returns:
        修改后的模型
    """
    # 练习: 实现LoRA注入
    raise NotImplementedError("请实现inject_lora函数")

def get_lora_parameters(model: nn.Module):
    """获取所有LoRA参数"""
    raise NotImplementedError("请实现get_lora_parameters函数")

def freeze_base_model(model: nn.Module):
    """冻结基础模型的所有参数（不冻结LoRA参数）"""
    raise NotImplementedError("请实现freeze_base_model函数")

def unfreeze_lora_parameters(model: nn.Module):
    """只解冻LoRA参数"""
    raise NotImplementedError("请实现unfreeze_lora_parameters函数")

def save_lora_weights(model: nn.Module, save_path: str):
    """只保存LoRA权重（requires_grad=True的参数）"""
    raise NotImplementedError("请实现save_lora_weights函数")

def load_lora_weights(model: nn.Module, load_path: str):
    """加载LoRA权重"""
    raise NotImplementedError("请实现load_lora_weights函数")

def print_trainable_parameters(model: nn.Module):
    """打印可训练参数信息"""
    trainable_params = 0
    all_params = 0

    for name, param in model.named_parameters():
        all_params += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
            print(f"可训练: {name}: {param.numel()}")

    print(f"\n可训练参数: {trainable_params:,} || "
          f"总参数: {all_params:,} || "
          f"比例: {100*trainable_params/all_params:.4f}%")

def merge_lora_weights(linear_with_lora: LinearWithLoRA) -> nn.Linear:
    """
    将LoRA权重合并到基础线性层

    W_merged = W_base + B @ A * scaling
    """
    raise NotImplementedError("请实现merge_lora_weights函数")

最后更新日期： 2026-02-12 适用版本： LLM 学习教程 v2026