Neural Nets with Threads

CMSC 483/691c, Parallel Programming

Project 1: Neural Nets with Threads

Project Goals

The purpose of this project is to get hands-on experience programming matrix operations with threads and to provide an introduction to neural networks and back-propagation training.

The Project

You are to write two programs using threads, one to do a three-layer "forward" neural network calculation, and one to train a three-layer network by back-propagation. The two programs may share procedures and/or headers.

Forward Calculation

A three layer forward calculation can be represented as

              y = f(x) = W3*t(W2*t(W1*x + b1) + b2) + b3

where x is a column vector input, y a column vector output, W1, W2, and W3 are matrices, b1, b2, and b3 are vectors, and t() represents the application of the hyperbolic tangent function to each element of a matrix or vector. The function f can be generalized to an operation from matrices X to matrices Y, if we duplicate columns of the bias vectors so that the additions conform.

Your forward program will read the X matrix, the W matrices and b vectors, and some stats on X and Y used to normalize the data, and will calculate and then write the Y matrix. All the matrices and vectors are saved as simple ASCII-format data, in column order. Sample C procedures are provided to read and write such data.

Your program to do forward calculations should be called "nnfwd", and should

read filenames for the input and output matrices (X and Y in the formula above) and the number of threads from the command line; if no number of threads is given, it should default to the actual number of processors,
read the input matrix (X in the formula, above),
read the matrices "W1", "W2", and "W3", vectors "B1", "B2", and "B3", and the training data statistics vectors "Xmean", "Xstd", "Ymean", and "Ystd",
rescale the input data: subtract the mean, and divide by the standard deviation,
do the three-layer forward calculation, as described above,
rescale the output data: multipy by the standard deviation and add the mean, and
write out the result matrix (Y in the formula above).

A typical invocation would be

          nnfwd X1 Y1 4

where "X1" and "Y1" are matrix file names, and Y1 is to be calculated using 4 threads.

Training

The weights W and bias vectors B are chosen so that a forward calculation gives a reasonable fit to some particular set of training data. Let X and Y be matrices, and suppose that each column of X represents an input and each column of Y a desired output of our network. We want to find weight matrices and bias vectors such that the output of the forward calculation F(X) is as close to the supplied data Y as possible. This is done by "training" the network.

If e is an error term, for example, e = |F(X) - Y|, then we want to minimize e. This is done by finding de/dw_ikj for each weight w_ikj and adjusting the weights w_ikj by a small increment along the gradient, and checking the error with the new weights. Details of this adjustment may vary; a sample training program is provided in Matlab that uses both "momentum" and an "adaptive learning rate". You do not have to use the training algorithm used in the demo program nntrain.m, improvements are welcome.

Your training program should be called "nntrain", and should

read a limit on the number of "epochs" (training iterations) and the number of threads from the command line; if the number of threads is not given, it should default to the actual number of processors,
read the training data arrays "X" and "Y", the stats on the training data, "Xmean", "Xstd", "Ymean", "Ystd", the weight matrices "W1", "W2", and "W3", and the bias vectors "B1", "B2", and "B3",
rescale the training data: subtract the mean, and divide by the standard deviation, for both X and Y.
do back-propagation training, for at most the requested number of epochs, to improve the weight matrices and bias vectors,
write out the new weight matrices "W1", "W2", and "W3" and bias vectors "B1", "B2", and "B3"

A typical invocation would be

          nntrain 100 4

where the training should be done for most 100 epochs, using 4 threads.

Getting Started

Since this is a course in parallel processing rather than neural networks, demo Matlab procedures are provided for both the forward calculation and for back-propagaion training; nnfwd.m is a Matlab implementation of nnfwd, and nntrain.m a Matlab implementation of nntrain.

The Matlab procedures are

     nnfwd.m    -- demo program to do a forward calculation

     nntrain.m  -- demo program to train a network

     nninit.m   -- initial values for weight and bias matrices 
 
     nnparams.m -- prompt user for various net test parameters

     nntest.m   -- demo net test program

The program nntest.m prompts the user for network parameters, generates some sample test data, calls nninit.m to generate initial values for the weight matrices and bias and to calculate the stats vectors, calls nntrain to train up the net, and nnfwd to check the accuracy of the trained net.

Note that the project is to implement nnfwd and nntrain with threads, you can use the provided nninit.m and nnparams.m to set thing up, and you can use nntest.m as a prototype for testing your own networks.

Test Data

The Matlab procedures nninit.m to initialize weight matrices and bias vectors and nntest.m to generate test data are provided. Relatively small matrices--say, 4 or 5 element inputs, outputs and and hidden layers, and maybe 100 epochs of training--are fine for doing debugging and testing. The default values are for a larger test, and are useful for benchmarks against the Matlab code.

Timing

Both the training and forward calculation programs should report "wall" runtime (not CPU usage) to the nearest second, with the Unix "time" sys call. Start counting time after all the matrices are read, and before any calculation is done, and stop counting time after the calculations are done, and before any data is written. Your time message should be printed to stderr.

Note that the forward calculation is very fast, in comparison to typical training times; when I test your nnfwd I will use a large enough input matrix, on the order of 100 x 1000, to get a significant measurement.

Your programs should be at least as fast as the Matlab demo programs, when run on a single processor, and significantly faster. when run on multiple processors.

Submitting Your Project

Make a tar file containing the project files, that is, your *.c, *.h, and Makefile, and submit this as an email attachment, with subject "project 1". Make sure that your name and "project 1" are at the top of your main files.

Grading

Make sure you read the page concerning general information on programming projects. About 60% of your grade is based on how well your code works, with the remainder of the points divided between design and documentation. Projects are due by midnight of the assigned date; there is a 5% bonus for each day a project is turned in early (for up to two days early), and a 5% penalty for each day the project is turned in late.