MultiSubject Neural Nets Project

Background

Neural data is limited by the number of subjects available for research and the technology used for experimentation. This is more evident with brain-machine interfaces (BMIs) which require invasive data. A BMI is a direct communication pathway between a brain and an external device that allows a person to control devices with his or her brainwaves. BMIs are used for neuroprosthetics applications that aim to restore cognitive and sensory motor functions including hearing, sight, speech, and movement. As both anatomy and functional responses differ between individuals, we will compare data from multiple subjects in order to improve the performance of a single-subject BMI. This project will explore multisubject data through computer simulations of the MNIST toy dataset and then to human neural data collected during speech production.

Deep Feedforward Neural Network with MNIST

The basic outline of the deep neural network was created using Pylearn2 and Theano, a Python library and mathematical expression compiler. We first used the MNIST toy dataset, a 60,000×784 matrix of handwritten digits from 0 to 9, which makes 10 classes total. As this project is based on a classification task, we used a machine learning task called supervised learning that maps a set of input to its correct output. The cost and gradients were calculated using logistic regression and backpropagation, which updates the weights by minimizing the cost function. Backpropagation is a method used for computing the gradient, while stochastic gradient descent performs the learning. The cost function is the cross-entropy function.

Cost(\theta) = -\sum_{x} (y(x)log(\hat{y}(x)))

These are preliminary notes from Jupyter notebook:

Cost and Gradients:

  • \hat{y} = x*W + b = T.nnet.softmax(X_sym.dot(W) + b.dimshuffle(‘x’, 0))
  • cost = T.mean(T.nnet.categorical_crossentropy(y_hat, y_sym))
  • accuracy = T.mean(T.eq(y_sym, T.argmax(y_hat, axis = 1)))
  • W_grad = T.grad(cost, W)
  • B_grad = T.grad(cost, b)

Theano functions

  • f = theano.function(inputs = [X_sym, y_sym], outputs = [cost, accuracy])
  • f_updates = theano.function(inputs = [x_sym, y_sym], outputs = [cost, accuracy], updates = updates)

Other variables from code:

  • train_objective: cost being optimized by training
  • train_y_nll: negative log likelihood of the current parameter values
  • nvis: number of visible units

Softmax regression (multinomial logistic regression) is used to classify K number of classes:  y(i)∈{1,…,K}. Whereas in logistic regression the labels are binary, y(i)∈{0,1}, softmax allows us to handle multiple classes. It transforms a level of activation into a probability.

(p_{1}…p_{n}) = softmax (y_{1}…..y_{n}) = ({\frac{e^{y_{1}}}{\sum_{j=1}^{n} e^{y_{j}}}}...{\frac{e^{y_{n}}}{\sum_{j=1}^{n} e^{y_{j}}}})

The main implementation of the neural network was taken from Gustav’s blog (http://www.arngarden.com/2013/07/29/neural-network-example-using-pylearn2/), which shows a step-by-step process of training a neural network. Once the outline was largely set up, we grouped the code inside a function called analyze with the parameters n_train, params, and n_iter=5, with n_train being the number of training samples, params being the number of parameters, and n_iter as the number of iterations of analyze to be run. This function returns misclass_all_iter, a 5×3 matrix displaying the misclassification values for the train, validation, and test sets for 5 iterations.

The MNIST dataset was divided into 3 sets: training, validation, and test. The training set is the original set that is trained, the validation set is used for tuning the parameters of the model and to avoid over-fitting, and the test set is used for performance evaluation. The performance on the test set gives a realistic estimate of the performance of the model on unseen, new data. The training set starts from a random number between 0 and 50,000-n_train and stops at n+n_train, and the validation set is the last 10,000 digits of the MNIST data set. The data was tested against various layer types such as Sigmoid, Tanh, and RectifiedLinear, and the momentum and learning rate adjustors were used for optimization.

tanh(a)=(e^a-e^{-a})/(e^a+e^{-a})

sigmoid(a)=1/(1+e^{-a})

This was used to set up the neural network:

layers=[hidden_layer,output_layer]

ann = mlp.MLP(layers, nvis=784)

trainer.setup(ann, ds_train)

After that, I made a plot showing the accuracy vs. the number of training samples. The accuracy curve increased for all three sets then plateaued. Next, Spearmint was used to perform Bayesian optimization on the results. A new function called main(job_id, params) was implemented with various parameters set for optimization. The plot from Spearmint resulted in a higher accuracy curve for the training set as opposed to those of the validation and test sets.

acc_plot

Spearmint_best

 

Multilayer Perceptron (Deep Feedforward Network)

After implementing the example neural network, we moved onto a multilayer perceptron (MLP) model with multiple inputs representing neural data from different subjects, the hidden layer, and an output layer which predicts the accuracy of the classification of the MNIST digits. The purpose of using this model is based on the hypothesis that leveraging data from multiple subjects can improve performance as opposed to just using one dataset which doesn’t provide a lot of information. An MLP model consists of an input layer, hidden layer(s), and an output layer, which is shared among all the inputs. A feedforward network defines a mapping y = f(x; \theta) and learns the value of the parameters theta that result in the best approximation.

y = f(x; \theta, w) = \Phi(x; \theta)^T*w
*The goal is to learn \Phi, which represents the hidden layer activation function.

We first worked on changing the cost functions to fit the functionalities and parameters of the model by modifying the original code for a single MLP from the LISA Lab. Next, we created a new file called multisubject_network.py to extract some of the code from the previous files that had been written for a single layer MLP. We created a separate for-loop for each mlp in n_MLP and created an empty MLP list. The next step is to fix the bugs and make sure everything works.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s