实践项目：手写神经网络¶

目标：从零开始，用纯 NumPy 实现一个完整的多层感知机（ MLP ），并在 MNIST 数据集上训练，达到 95%+的准确率。

📋 项目概述¶

本项目将带你实现一个完整的神经网络训练流程，包括： - 前向传播 - 反向传播 - 参数更新 - 训练循环 - 模型评估

通过这个项目，你将真正理解神经网络是如何工作的！

🗂️ 文件结构¶

Text Only

实践-手写神经网络/
└── README.md              # 本文件（包含所有代码参考，见附录）

所有代码文件（mlp_numpy.py, train.py）的完整内容已包含在本文档的 附录：完整代码参考 部分。请先自己动手实现，遇到困难时再参考。

🎯 学习目标¶

完成本项目后，你将能够：

理解前向传播：数据如何在网络中流动
理解反向传播：梯度如何计算和传播
理解参数更新：梯度下降如何优化网络
调试训练过程：识别和解决训练中的问题
评估模型性能：准确率、损失曲线等指标

📚 理论基础¶

神经网络训练流程¶

Text Only

┌─────────────────────────────────────────────────────────────────────┐
│                    神经网络训练流程                                    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. 初始化参数                                                       │
│     └── 随机初始化权重W和偏置b                                       │
│                                                                     │
│  2. 前向传播（Forward）                                              │
│     └── 计算预测输出                                                 │
│                                                                     │
│  3. 计算损失（Loss）                                                 │
│     └── 衡量预测与真实的差距                                         │
│                                                                     │
│  4. 反向传播（Backward）                                             │
│     └── 计算梯度（损失对每个参数的导数）                              │
│                                                                     │
│  5. 参数更新（Update）                                               │
│     └── 使用梯度下降更新参数                                         │
│                                                                     │
│  6. 重复步骤2-5，直到收敛                                            │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

反向传播核心¶

反向传播使用链式法则计算梯度：

Text Only

损失 L 对 权重 W 的梯度 = ∂L/∂W

通过链式法则：
∂L/∂W = ∂L/∂y · ∂y/∂z · ∂z/∂W

其中：
• ∂L/∂y：损失对输出的梯度
• ∂y/∂z：激活函数的导数
• ∂z/∂W：线性变换的梯度（就是输入x）

🚀 快速开始¶

环境要求¶

Bash

# Python 3.10+
# NumPy
# Matplotlib（用于可视化）

pip install numpy matplotlib

运行训练¶

Bash

# 进入项目目录
cd 实践-手写神经网络

# 运行训练
python train.py

预期输出¶

Text Only

网络结构: 784-128-10
参数数量: 101770

开始训练...
Epoch 0/10
  Batch 0/938, Loss: 2.3026
  Batch 100/938, Loss: 1.8234
  Batch 200/938, Loss: 1.2456
  ...
Epoch 0 完成, 平均损失: 0.8234, 准确率: 78.45%

Epoch 1/10
  ...
Epoch 1 完成, 平均损失: 0.4234, 准确率: 88.23%

...

训练完成！
最终测试准确率: 95.67%

📝 代码实现指南¶

步骤 1 ：实现前向传播¶

参考附录中 mlp_numpy.py 的 forward 方法。

关键公式：

Python

# 第1层
z1 = X @ W1 + b1
h1 = relu(z1)

# 第2层
z2 = h1 @ W2 + b2
output = softmax(z2)

步骤 2 ：实现反向传播¶

参考附录中 mlp_numpy.py 的 backward 方法。

关键梯度：

Python

# 输出层梯度
dz2 = output - y_true  # softmax + cross-entropy 的梯度

# 隐藏层梯度
dh1 = dz2 @ W2.T
dz1 = dh1 * (z1 > 0)   # ReLU的导数

# 参数梯度
dW2 = h1.T @ dz2
db2 = np.sum(dz2, axis=0)
dW1 = X.T @ dz1
db1 = np.sum(dz1, axis=0)

步骤 3 ：实现参数更新¶

Python

# 梯度下降
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
W2 -= learning_rate * dW2
b2 -= learning_rate * db2

🎓 学习建议¶

1. 先理解原理¶

不要急于运行代码，先理解： - 前向传播的计算过程 - 反向传播的梯度流动 - 损失函数的意义

2. 逐步验证¶

每实现一个功能就测试： - 检查输出的形状是否正确 - 检查数值范围是否合理 - 与 PyTorch 的结果对比

3. 可视化调试¶

使用可视化工具观察： - 损失曲线是否下降 - 梯度分布是否正常 - 权重变化是否合理

4. 实验探索¶

尝试修改以下参数，观察效果： - 学习率（太大/太小会怎样？） - 隐藏层大小 - 激活函数（ Sigmoid vs ReLU ） - 批量大小

❓ 常见问题¶

Q1 ：损失不下降怎么办？

A ：检查以下几点： - 学习率是否合适（尝试 1e-4 到 1e-1 ） - 梯度计算是否正确（数值梯度检验） - 数据预处理是否正确（归一化） - 权重初始化是否合理（不要太大的初始值）

Q2 ：准确率很低怎么办？

A ：可能原因： - 训练轮数不够 - 网络容量不足（增加隐藏层神经元） - 过拟合（训练集准确率很高但测试集低） - 数据问题（检查数据加载）

Q3 ：训练很慢怎么办？

A ：优化方法： - 使用更大的批量大小 - 使用更小的网络 - 使用部分数据集先验证 - 向量化计算（避免 Python 循环）

Q4 ：如何验证反向传播正确？

A ：使用数值梯度检验：

Python

def numerical_gradient(f, x, h=1e-5):
    grad = np.zeros_like(x)
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        old_val = x[idx]

        x[idx] = old_val + h
        fxh1 = f(x)

        x[idx] = old_val - h
        fxh2 = f(x)

        grad[idx] = (fxh1 - fxh2) / (2 * h)
        x[idx] = old_val
        it.iternext()
    return grad

# 对比解析梯度和数值梯度
# 两者应该非常接近（相对误差 < 1e-7）

✅ 训练检查清单¶

在开始训练前，请确认以下事项：

数据预处理¶

数据已归一化/标准化（像素值范围合适）
训练集和测试集已正确划分
数据维度正确（ MNIST 应为 784 维）

模型架构¶

输入层维度匹配数据维度
隐藏层神经元数量合理（ 64-512 ）
输出层维度匹配类别数（ MNIST 为 10 ）
激活函数选择正确（隐藏层 ReLU ，输出层 Softmax ）

权重初始化¶

使用 He 初始化（适用于 ReLU ）
避免所有权重初始化为相同值
偏置初始化为 0

训练配置¶

学习率设置合理（ 0.001-0.1 ）
批量大小合适（ 32-128 ）
训练轮数足够（ 10-50 轮）
早停机制已配置

训练监控¶

损失在下降
准确率在提升
没有梯度爆炸（ loss 为 NaN ）
没有严重的过拟合（ train_acc >> test_acc ）

📊 预期结果¶

在 MNIST 数据集上，你应该能达到：

指标	目标值	说明
训练准确率	> 98%	在训练集上的表现
测试准确率	> 95%	在测试集上的泛化能力
训练时间	< 5 分钟	CPU 上训练 10 轮

如果达不到，检查： 1. 学习率是否合适 2. 网络结构是否合理 3. 数据预处理是否正确 4. 梯度计算是否正确 5. 是否使用了早停（可能停止过早）

🎯 进阶挑战¶

完成基础实现后，尝试以下改进：

挑战 1 ：添加更多层¶

实现 3 层或更多隐藏层的网络。

挑战 2 ：实现不同的优化器¶

Momentum
RMSprop
Adam

挑战 3 ：添加正则化¶

L2 正则化（权重衰减）
Dropout

挑战 4 ：更好的初始化¶

Xavier 初始化
He 初始化

挑战 5 ：批量归一化¶

实现 Batch Normalization 。

📚 参考资源¶

CS231n Lecture 4 - Backpropagation and Neural Networks
《 Neural Networks and Deep Learning 》 - Michael Nielsen （第 2 章）
PyTorch 官方教程 - 对比你的实现和 PyTorch 的实现

✅ 检查清单¶

完成项目前，确认你已经：

理解前向传播的每个步骤
理解反向传播的梯度流动
能够手动推导梯度公式
代码能够正常运行
训练准确率达到 95%以上
能够解释每个超参数的作用
尝试过调试训练过程中的问题

🎉 完成奖励¶

完成本项目后，你将： - ✅ 真正理解神经网络的工作原理 - ✅ 具备从零实现深度学习模型的能力 - ✅ 能够独立调试和优化神经网络 - ✅ 为学习更复杂的模型（ CNN 、 RNN 、 Transformer ）打下坚实基础

恭喜你迈出了成为深度学习工程师的重要一步！

下一步：完成本项目后，继续学习 CNN 章节，了解卷积神经网络。

附录：完整代码参考¶

⚠️ 警告：请先自己尝试实现，遇到困难时再参考以下代码。

MLP 核心实现（ mlp_numpy.py ）¶

Python

"""
MLP的NumPy实现

本文件实现了一个完整的多层感知机，包括：
- 前向传播
- 反向传播
- 参数更新
"""

import numpy as np

class MLP:
    """
    多层感知机（Multi-Layer Perceptron）

    架构: 输入层 -> 隐藏层 -> 输出层

    参数:
        input_size: 输入特征维度
        hidden_size: 隐藏层神经元数量
        output_size: 输出类别数量
    """

    def __init__(self, input_size, hidden_size, output_size):  # __init__构造方法，创建对象时自动调用
        """
        初始化网络参数

        使用He初始化（适用于ReLU激活函数）
        """
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        # 初始化权重和偏置
        self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2.0 / input_size)
        self.b1 = np.zeros(hidden_size)

        self.W2 = np.random.randn(hidden_size, output_size) * np.sqrt(2.0 / hidden_size)
        self.b2 = np.zeros(output_size)

        # 存储中间结果，用于反向传播
        self.cache = {}

        print(f"网络结构: {input_size}-{hidden_size}-{output_size}")
        num_params = input_size * hidden_size + hidden_size + hidden_size * output_size + output_size
        print(f"参数数量: {num_params}")

    def relu(self, x):
        """ReLU激活函数: f(x) = max(0, x)"""
        return np.maximum(0, x)

    def relu_derivative(self, x):
        """ReLU的导数: f'(x) = 1 if x > 0 else 0"""
        return (x > 0).astype(float)

    def softmax(self, x):
        """
        Softmax激活函数

        将输入转换为概率分布（所有输出之和为1）
        """
        orig_shape = x.shape
        if x.ndim == 1:
            x = x.reshape(1, -1)  # reshape重塑张量形状

        # 减去最大值，防止数值溢出
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        result = exp_x / np.sum(exp_x, axis=1, keepdims=True)

        if orig_shape == (x.shape[1],) or (len(orig_shape) == 1 and orig_shape[0] == x.shape[1]):
            result = result.flatten()
        return result

    def forward(self, X):
        """
        前向传播

        计算流程:
        输入 -> 线性变换 -> ReLU -> 线性变换 -> Softmax -> 输出

        参数:
            X: 输入数据，形状 (batch_size, input_size)
        返回:
            输出概率，形状 (batch_size, output_size)
        """
        # 第1层: 输入 -> 隐藏层
        z1 = np.dot(X, self.W1) + self.b1  # 线性变换
        h1 = self.relu(z1)                  # ReLU激活

        # 第2层: 隐藏层 -> 输出层
        z2 = np.dot(h1, self.W2) + self.b2  # 线性变换
        output = self.softmax(z2)            # Softmax激活

        # 缓存中间结果，用于反向传播
        self.cache = {
            'X': X, 'z1': z1, 'h1': h1, 'z2': z2, 'output': output
        }

        return output

    def compute_loss(self, y_pred, y_true):
        """
        计算交叉熵损失

        公式: L = -sum(y_true * log(y_pred)) / batch_size
        """
        batch_size = y_pred.shape[0]
        correct_logprobs = -np.log(y_pred[range(batch_size), y_true] + 1e-7)
        loss = np.sum(correct_logprobs) / batch_size
        return loss

    def backward(self, y_true):
        """
        反向传播

        计算损失函数对每个参数的梯度
        """
        X = self.cache['X']
        z1 = self.cache['z1']
        h1 = self.cache['h1']
        output = self.cache['output']

        batch_size = X.shape[0]

        # 输出层梯度（softmax + cross-entropy简化为: output - y_true）
        dz2 = output.copy()
        dz2[np.arange(batch_size), y_true] -= 1
        dz2 /= batch_size

        # 计算W2和b2的梯度
        dW2 = np.dot(h1.T, dz2)
        db2 = np.sum(dz2, axis=0)

        # 隐藏层梯度
        dh1 = np.dot(dz2, self.W2.T)
        dz1 = dh1 * self.relu_derivative(z1)

        # 计算W1和b1的梯度
        dW1 = np.dot(X.T, dz1)
        db1 = np.sum(dz1, axis=0)

        return {'W1': dW1, 'b1': db1, 'W2': dW2, 'b2': db2}

    def update_params(self, grads, learning_rate):
        """更新参数（梯度下降）"""
        self.W1 -= learning_rate * grads['W1']
        self.b1 -= learning_rate * grads['b1']
        self.W2 -= learning_rate * grads['W2']
        self.b2 -= learning_rate * grads['b2']

    def predict(self, X):
        """预测类别"""
        y_pred = self.forward(X)
        return np.argmax(y_pred, axis=1)

    def accuracy(self, X, y):
        """计算准确率"""
        predictions = self.predict(X)
        return np.mean(predictions == y)

def train_epoch(model, X, y, batch_size, learning_rate):
    """
    训练一个epoch

    参数:
        model: MLP模型
        X: 训练数据
        y: 训练标签
        batch_size: 批量大小
        learning_rate: 学习率
    返回:
        平均损失
    """
    num_samples = X.shape[0]
    num_batches = num_samples // batch_size
    total_loss = 0

    # 随机打乱数据
    indices = np.random.permutation(num_samples)
    X_shuffled = X[indices]
    y_shuffled = y[indices]

    for i in range(num_batches):
        start_idx = i * batch_size
        end_idx = start_idx + batch_size
        X_batch = X_shuffled[start_idx:end_idx]
        y_batch = y_shuffled[start_idx:end_idx]

        # 前向传播
        y_pred = model.forward(X_batch)

        # 计算损失
        loss = model.compute_loss(y_pred, y_batch)
        total_loss += loss

        # 反向传播
        grads = model.backward(y_batch)

        # 更新参数
        model.update_params(grads, learning_rate)

    return total_loss / num_batches

MNIST 训练脚本（ train.py ）¶

Python

"""
MNIST训练脚本

使用NumPy实现的MLP在MNIST数据集上进行训练
"""

import numpy as np
import pickle
import gzip
import os
from urllib import request
from mlp_numpy import MLP, train_epoch

def download_mnist(data_dir='./data'):
    """下载MNIST数据集"""
    os.makedirs(data_dir, exist_ok=True)

    url = 'http://yann.lecun.com/exdb/mnist/'
    files = {
        'train_images': 'train-images-idx3-ubyte.gz',
        'train_labels': 'train-labels-idx1-ubyte.gz',
        'test_images': 't10k-images-idx3-ubyte.gz',
        'test_labels': 't10k-labels-idx1-ubyte.gz'
    }

    paths = {}
    for key, filename in files.items():
        filepath = os.path.join(data_dir, filename)
        paths[key] = filepath

        if not os.path.exists(filepath):
            print(f"下载 {filename}...")
            try:  # try/except捕获异常，防止程序崩溃
                request.urlretrieve(url + filename, filepath)
                print(f"下载完成: {filename}")
            except Exception as e:
                print(f"下载失败: {e}")
                print("请手动下载MNIST数据集并放入data目录")
                return None

    return paths

def load_mnist(data_dir='./data'):
    """加载MNIST数据集（如果本地没有数据，会尝试下载）"""
    cache_file = os.path.join(data_dir, 'mnist.pkl')
    if os.path.exists(cache_file):
        print("从缓存加载MNIST数据...")
        with open(cache_file, 'rb') as f:  # with open自动管理文件打开和关闭
            return pickle.load(f)

    paths = download_mnist(data_dir)
    if paths is None:
        print("使用模拟数据...")
        return generate_mock_mnist()

    print("加载MNIST数据...")

    def load_images(filepath):
        with gzip.open(filepath, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=16)
        data = data.reshape(-1, 28*28) / 255.0
        return (data - 0.1307) / 0.3081

    def load_labels(filepath):
        with gzip.open(filepath, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=8)
        return data

    X_train = load_images(paths['train_images'])
    y_train = load_labels(paths['train_labels'])
    X_test = load_images(paths['test_images'])
    y_test = load_labels(paths['test_labels'])

    print(f"训练集: {X_train.shape[0]} 样本")
    print(f"测试集: {X_test.shape[0]} 样本")

    with open(cache_file, 'wb') as f:
        pickle.dump((X_train, y_train, X_test, y_test), f)

    return X_train, y_train, X_test, y_test

def generate_mock_mnist():
    """生成模拟的MNIST数据（用于测试）"""
    np.random.seed(42)

    X_train = np.random.uniform(0, 1, (1000, 784))
    X_train = (X_train - 0.1307) / 0.3081
    y_train = np.random.randint(0, 10, 1000)

    X_test = np.random.uniform(0, 1, (200, 784))
    X_test = (X_test - 0.1307) / 0.3081
    y_test = np.random.randint(0, 10, 200)

    print("使用模拟MNIST数据（用于测试）")
    print(f"训练集: {X_train.shape[0]} 样本")
    print(f"测试集: {X_test.shape[0]} 样本")

    return X_train, y_train, X_test, y_test

def train_model(model, X_train, y_train, X_test, y_test,
                epochs=10, batch_size=64, learning_rate=0.1,
                early_stopping_patience=5, min_delta=0.001):
    """训练模型（支持早停和学习率调度）"""
    print("\n" + "="*60)
    print("开始训练")
    print("="*60)

    best_acc = 0
    best_epoch = 0
    patience_counter = 0
    history = {
        'train_loss': [], 'train_acc': [],
        'test_acc': [], 'learning_rate': []
    }

    for epoch in range(epochs):
        # 学习率调度：余弦退火
        current_lr = learning_rate * 0.5 * (1 + np.cos(np.pi * epoch / epochs))

        train_loss = train_epoch(model, X_train, y_train, batch_size, current_lr)
        train_acc = model.accuracy(X_train[:1000], y_train[:1000])  # 切片操作取子序列
        test_acc = model.accuracy(X_test, y_test)

        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)
        history['test_acc'].append(test_acc)
        history['learning_rate'].append(current_lr)

        print(f"Epoch {epoch+1}/{epochs} | "
              f"LR: {current_lr:.6f} | "
              f"Loss: {train_loss:.4f} | "
              f"Train Acc: {train_acc:.4f} | "
              f"Test Acc: {test_acc:.4f}")

        if test_acc > best_acc + min_delta:
            best_acc = test_acc
            best_epoch = epoch + 1
            patience_counter = 0
            np.savez('best_model.npz',
                    W1=model.W1, b1=model.b1,
                    W2=model.W2, b2=model.b2)
            print(f"  → 新的最佳模型！测试准确率: {best_acc:.4f}")
        else:
            patience_counter += 1
            if patience_counter >= early_stopping_patience:
                print(f"\n早停触发！连续{early_stopping_patience}轮没有提升")
                break

    print(f"训练完成！最佳测试准确率: {best_acc:.4f} (第{best_epoch}轮)")
    return history

def main():
    """主函数"""
    print("MNIST手写数字识别 - NumPy实现")

    X_train, y_train, X_test, y_test = load_mnist()

    model = MLP(input_size=784, hidden_size=128, output_size=10)

    history = train_model(model, X_train, y_train, X_test, y_test,
                         epochs=10, batch_size=64, learning_rate=0.1)

    # 整体准确率
    acc = model.accuracy(X_test, y_test)
    print(f"\n最终测试准确率: {acc:.4f} ({acc*100:.2f}%)")

if __name__ == "__main__":
    main()