Handwritten Digit Recognition by Convolutional Neural Network

This is a demonstration of my JavaScript-based Convolutional Neural Network. Draw a digit from 0 to 9 in the left box, and the network will attempt to recognize it. The system evaluates your drawing after each stroke (mouse button up), so expect incorrect intermediate results if you draw your digit using more than one stroke. Because the pre-processing of your input records and scales strokes, rather than downsampling the canvas bitmap directly, building up a drawing from short dashes or dots may not give accurate classification.

The associated training page for this network is here: digittraining.html

Confidence: 0%

Note: This demo uses ES6 features and will not run in Safari 9.x. Try Chrome 52+, Firefox 50+.

But this has been done before...

Yes, well, sort of. It is a classic dataset now, and one that thousands have tested CNNs on. There are a couple of other demos similar to this one on the web already, but what makes this one a little different is that I've coded the full CNN in JavaScript--forward and backward passes--and the demo above was trained on the same codebase it's running on now. The other online demos I've found that let you draw digits for recognition have only the forward implementation in JavaScript, and their network was trained externally in a commercial product like MATLAB or TensorFlow and imported into JS. Andrej Karpathy's ConvNetJS is the only other complete JavaScript CNN implementation that I'm aware of.

Because my motivation for doing this project is to learn about how these things work at the most fundamental level, and because deriving and implementing the gradient descent training of the network is where the real math is, I decided not to cut any corners and to intentionally re-invent the wheel for my own understanding. You can see my somewhat crude training page here: digittraining.html This network instance has been trained on the MNIST set of 60,000 handwritten digit images, and scores approximately 98% accuracy on the associated set of 10,000 validation images.

So, what is this doing?

The images you draw in the box above are being fed into a Convolutional Neural Network that I wrote in JavaScript/ES6 and trained on the MNIST dataset of handwritten digits. The network consists of digital filters that started out (prior to training) initialized with random values in their kernels. The network "learns" to distinguish the features of digits by a negative feedback process known as gradient descent. Labeled example digits are fed into the network, and any error in their classification is used to tune the network--making small adjustments to values in the filter kernels and to the weights of connections to the output layer--to produce a more accurate score. This is repeated for tens of thousands of example digit images, until the network has converged on a set of filters that can accurately differentiate between all 10 digits.

This network's architecture

The network analyzing the digits consists of:

The input image pre-processor, which crops your drawing down to a 24x24 pixel input image, redrawing it with a thinner stroke if it does not fill the box
A convolution layer with (10) 5x5x1 filter kernels, ReLU activation
A modified¹ max pooling layer, 2x2 with strides of 2
A convolution layer with (20) 5x5x10 filter kernels, ReLU activation
A modified max pooling layer, 2x2 with strides of 2
A fully-connected layer with 10 units with soft-max activation

During training of the network, I used a small amount of input image augmentation suggested by Andrej Karpathy in the notes accompanying his MNIST example here: http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html A random 24x24 crop is taken from each of the 28x28 digit images. The network was trained using 2 passes of the dataset, a total of 120,000 impressions.

¹I've implemented the max pool layers to backpropagate error evenly to all input pixels that attained the maximum during training, when there is more than one, rather than the more traditional approach of randomly selecting one when this occurs.