Building a Digit Recognizer Using Neural Networks in Python

Noyabr 22, 2024

In the world of machine learning, handwritten digit recognition is a classic example of supervised learning and an entry point to neural network modeling. By recognizing digits, we not only delve into the basics of neural networks but also understand fundamental operations that are applicable to a wide variety of problems in image recognition. In this article, we’ll walk through building a digit recognizer using a neural network with just a few lines of code in Python and explore the step-by-step logic behind this model.

Prerequisites

You’ll need basic knowledge of Python, NumPy, and Matplotlib for this project. For data processing and visualization, we’ll use pandasnumpy, and matplotlib libraries.

Step 1: Import Libraries and Load Data

First, we import the essential libraries and load our data, which consists of handwritten digit images in pixel format. Each row in the dataset represents one image, with the label column indicating the digit (0-9) and the other columns containing pixel values.

				
					import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('/content/train.csv')
data.head()
data['label'].hist()
				
			

Step 2: Data Cleanliness Check

Before training, we check for missing or extreme values that might affect the model’s performance. The function below alerts us if our dataset has NaN, infinite values, or values outside a defined threshold

				
					def check_data_cleanliness(data):
    if np.isnan(data).any():
        print("Warning: Data contains NaN values.")
    if np.isinf(data).any():
        print("Warning: Data contains infinite values.")
    extreme_threshold = 1e3
    if np.any(np.abs(data) > extreme_threshold):
        print(f"Warning: Data contains values outside the range -{extreme_threshold} to {extreme_threshold}.")
    else:
        print("Data is clean (all values are finite and within expected range).")

data = np.array(data)
m, n = data.shape
check_data_cleanliness(data)
				
			

Step 3: Prepare Training and Test Data

To train and evaluate our model, we split the dataset into training and test sets, normalizing the data by dividing the pixel values by 255 (since pixel values range from 0 to 255).

				
					data_test = data[:1000].T
Y_test = data_test[0]
X_test = data_test[1:n] / 255.

data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n] / 255.
				
			

Step 4: Initialize Parameters

We initialize the weights (w1w2) and biases (b1b2) with random values to prevent symmetry during training.

				
					def initialize_parameters():
    w1 = np.random.rand(10, 784) - 0.5
    b1 = np.random.rand(10, 1) - 0.5
    w2 = np.random.rand(10, 10) - 0.5
    b2 = np.random.rand(10, 1) - 0.5
    return w1, b1, w2, b2
				
			

Step 5: Define Activation Functions

We use the ReLU function for hidden layers and the Softmax function for output layers, ensuring that our network performs efficiently with a clear distinction between activated and non-activated neurons.

				
					def ReLU(M):
    return np.maximum(M, 0)

def softmax(Z):
    A = np.exp(Z) / sum(np.exp(Z))
    return A

				
			

Step 6: Forward Propagation

In this step, we calculate the outputs of each layer by performing matrix multiplication between inputs and weights, adding the bias, and applying the activation functions.

				
					def fpropagation(w1, b1, w2, b2, X):
    z1 = w1.dot(X) + b1
    a1 = ReLU(z1)
    z2 = w2.dot(a1) + b2
    a2 = softmax(z2)
    return a1, z1, a2, z2


				
			

Step 7: One-Hot Encoding

For training, we convert the labels into a one-hot encoded format so the model learns each class separately.

				
					def one_hot_encode(Y):
    one_hot_y = np.zeros((Y.size, Y.max() + 1))
    one_hot_y[np.arange(Y.size), Y] = 1
    return one_hot_y.T


				
			

Step 8: Backpropagation

In the backpropagation function, we calculate gradients with respect to weights and biases, which are used for updating the parameters in the next step.

				
					def bpropagation(a1, z1, a2, z2, w2, X, Y):
    _, n = X.shape
    y_encoded = one_hot_encode(Y)
    dz2 = a2 - y_encoded
    dw2 = 1/n * dz2.dot(a1.T)
    db2 = 1/n * np.sum(dz2)
    dz1 = w2.T.dot(dz2) * (z1 > 0)
    dw1 = 1/n * dz1.dot(X.T)
    db1 = 1/n * np.sum(dz1)
    return dw1, db1, dw2, db2

				
			

Step 9: Parameter Update

Using the calculated gradients, we update the weights and biases by moving them in the direction of minimizing the loss.

				
					def update_parameters(w1, b1, w2, b2, dw1, db1, dw2, db2, alpha):
    w1 -= alpha * dw1
    b1 -= alpha * db1
    w2 -= alpha * dw2
    b2 -= alpha * db2
    return w1, b1, w2, b2

				
			

Step 10: Training the Model

We run multiple iterations of forward and backward propagation to train the model, updating parameters and tracking accuracy every 100 steps.

				
					def gradient_descent(X, Y, iterations, alpha):
    w1, b1, w2, b2 = initialize_parameters()
    for i in range(iterations):
        a1, z1, a2, z2 = fpropagation(w1, b1, w2, b2, X)
        dw1, db1, dw2, db2 = bpropagation(a1, z1, a2, z2, w2, X, Y)
        w1, b1, w2, b2 = update_parameters(w1, b1, w2, b2, dw1, db1, dw2, db2, alpha)
        if i % 100 == 0:
            print('Iteration:', i)
            print('Accuracy:', accuracy(predictions(a2), Y))
    return w1, b1, w2, b2

w1, b1, w2, b2 = gradient_descent(X_train, Y_train, 1000, 0.1)

				
			

Step 11: Testing

We test the model’s accuracy on unseen data, displaying each test image along with the model’s predicted and actual labels. Our model achieves an impressive 87% accuracy on the test set, which is promising for a basic neural network built from scratch.

				
					def test(index, w1, b1, w2, b2):
    curr_img = X_test[:, index, None]
    prediction = make_predictions(X_test[:, index, None], w1, b1, w2, b2)
    label = Y_test[index]
    print('Prediction:', prediction)
    print('Actual:', label)
    plt.imshow(curr_img.reshape((28, 28)) * 255, cmap='gray')
    plt.show()

test(42, w1, b1, w2, b2)
				
			

Conclusion

Building a digit recognizer provides a foundation for understanding neural networks and deep learning. This project showcases how to implement core neural network operations from scratch in Python, giving you insights into every layer’s role in transforming input into output. The simplicity of this model enables you to grasp its concepts, and with these basics, you’re well-equipped to experiment with more complex architectures in image recognition.

For more details and access to the dataset, visit Kaggle’s Digit Recognizer dataset.

Bizə Qoşul

Tədris Müddətini Başa Vur, Beynəlxalq Sertifikat Əldə Edərək Remote Iş Imkanı Qazan!