In the world of machine learning, handwritten digit recognition is a classic example of supervised learning and an entry point to neural network modeling. By recognizing digits, we not only delve into the basics of neural networks but also understand fundamental operations that are applicable to a wide variety of problems in image recognition. In this article, we’ll walk through building a digit recognizer using a neural network with just a few lines of code in Python and explore the step-by-step logic behind this model.
Prerequisites
You’ll need basic knowledge of Python, NumPy, and Matplotlib for this project. For data processing and visualization, we’ll use pandas
, numpy
, and matplotlib
libraries.
Step 1: Import Libraries and Load Data
First, we import the essential libraries and load our data, which consists of handwritten digit images in pixel format. Each row in the dataset represents one image, with the label
column indicating the digit (0-9) and the other columns containing pixel values.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('/content/train.csv')
data.head()
data['label'].hist()
Step 2: Data Cleanliness Check
Before training, we check for missing or extreme values that might affect the model’s performance. The function below alerts us if our dataset has NaN, infinite values, or values outside a defined threshold
def check_data_cleanliness(data):
if np.isnan(data).any():
print("Warning: Data contains NaN values.")
if np.isinf(data).any():
print("Warning: Data contains infinite values.")
extreme_threshold = 1e3
if np.any(np.abs(data) > extreme_threshold):
print(f"Warning: Data contains values outside the range -{extreme_threshold} to {extreme_threshold}.")
else:
print("Data is clean (all values are finite and within expected range).")
data = np.array(data)
m, n = data.shape
check_data_cleanliness(data)
Step 3: Prepare Training and Test Data
To train and evaluate our model, we split the dataset into training and test sets, normalizing the data by dividing the pixel values by 255 (since pixel values range from 0 to 255).
data_test = data[:1000].T
Y_test = data_test[0]
X_test = data_test[1:n] / 255.
data_train = data[1000:m].T
Y_train = data_train[0]
X_train = data_train[1:n] / 255.
Step 4: Initialize Parameters
We initialize the weights (w1
, w2
) and biases (b1
, b2
) with random values to prevent symmetry during training.
def initialize_parameters():
w1 = np.random.rand(10, 784) - 0.5
b1 = np.random.rand(10, 1) - 0.5
w2 = np.random.rand(10, 10) - 0.5
b2 = np.random.rand(10, 1) - 0.5
return w1, b1, w2, b2
Step 5: Define Activation Functions
We use the ReLU function for hidden layers and the Softmax function for output layers, ensuring that our network performs efficiently with a clear distinction between activated and non-activated neurons.
def ReLU(M):
return np.maximum(M, 0)
def softmax(Z):
A = np.exp(Z) / sum(np.exp(Z))
return A
Step 6: Forward Propagation
In this step, we calculate the outputs of each layer by performing matrix multiplication between inputs and weights, adding the bias, and applying the activation functions.
def fpropagation(w1, b1, w2, b2, X):
z1 = w1.dot(X) + b1
a1 = ReLU(z1)
z2 = w2.dot(a1) + b2
a2 = softmax(z2)
return a1, z1, a2, z2
Step 7: One-Hot Encoding
For training, we convert the labels into a one-hot encoded format so the model learns each class separately.
def one_hot_encode(Y):
one_hot_y = np.zeros((Y.size, Y.max() + 1))
one_hot_y[np.arange(Y.size), Y] = 1
return one_hot_y.T
Step 8: Backpropagation
In the backpropagation function, we calculate gradients with respect to weights and biases, which are used for updating the parameters in the next step.
def bpropagation(a1, z1, a2, z2, w2, X, Y):
_, n = X.shape
y_encoded = one_hot_encode(Y)
dz2 = a2 - y_encoded
dw2 = 1/n * dz2.dot(a1.T)
db2 = 1/n * np.sum(dz2)
dz1 = w2.T.dot(dz2) * (z1 > 0)
dw1 = 1/n * dz1.dot(X.T)
db1 = 1/n * np.sum(dz1)
return dw1, db1, dw2, db2
Step 9: Parameter Update
Using the calculated gradients, we update the weights and biases by moving them in the direction of minimizing the loss.
def update_parameters(w1, b1, w2, b2, dw1, db1, dw2, db2, alpha):
w1 -= alpha * dw1
b1 -= alpha * db1
w2 -= alpha * dw2
b2 -= alpha * db2
return w1, b1, w2, b2
Step 10: Training the Model
We run multiple iterations of forward and backward propagation to train the model, updating parameters and tracking accuracy every 100 steps.
def gradient_descent(X, Y, iterations, alpha):
w1, b1, w2, b2 = initialize_parameters()
for i in range(iterations):
a1, z1, a2, z2 = fpropagation(w1, b1, w2, b2, X)
dw1, db1, dw2, db2 = bpropagation(a1, z1, a2, z2, w2, X, Y)
w1, b1, w2, b2 = update_parameters(w1, b1, w2, b2, dw1, db1, dw2, db2, alpha)
if i % 100 == 0:
print('Iteration:', i)
print('Accuracy:', accuracy(predictions(a2), Y))
return w1, b1, w2, b2
w1, b1, w2, b2 = gradient_descent(X_train, Y_train, 1000, 0.1)
Step 11: Testing
We test the model’s accuracy on unseen data, displaying each test image along with the model’s predicted and actual labels. Our model achieves an impressive 87% accuracy on the test set, which is promising for a basic neural network built from scratch.
def test(index, w1, b1, w2, b2):
curr_img = X_test[:, index, None]
prediction = make_predictions(X_test[:, index, None], w1, b1, w2, b2)
label = Y_test[index]
print('Prediction:', prediction)
print('Actual:', label)
plt.imshow(curr_img.reshape((28, 28)) * 255, cmap='gray')
plt.show()
test(42, w1, b1, w2, b2)
Conclusion
Building a digit recognizer provides a foundation for understanding neural networks and deep learning. This project showcases how to implement core neural network operations from scratch in Python, giving you insights into every layer’s role in transforming input into output. The simplicity of this model enables you to grasp its concepts, and with these basics, you’re well-equipped to experiment with more complex architectures in image recognition.
For more details and access to the dataset, visit Kaggle’s Digit Recognizer dataset.