Sunday, August 21, 2016

Keras/Jupyter notebooks for my Gennovation Talk @ San Francisco


Last Wednesday, I co-presented at Gennovation Talks a talk on Artificial Intelligence, Machine Learning and Deep Learning. My co-presenter (actually the main presenter) was Abhishek Sharma, organizer of the San Francisco Deep Learning Enthusiasts Meetup group. You can find the slides for the talk here. I covered the part on Deep Learning, i.e., from slides 19-34. Ours was the first talk in the series.

The Gennovation talks are organized by Genesys, creators of the world's #1 Customer Experience (CX) Platform that empower their client companies to create exceptional omnichannel experiences, journeys and relationships. The talk was held at their very impressive offices at Daly City, a few minutes walk from the BART station. The talks are the brainchild of Merijn te Booij, Chief Marketing Officer at Genesys, and evangelist for the use of machine learning techniques into the CX workflow. While primarily aimed at Genesys employees, the talks are open to the public. At our talk, there were about 30 Genesys employees and about 40 non-employees.

The talk is at a somewhat high level. We aimed for breadth of coverage, trying to give the audience a taste of everything that is related to the three topics in our title. For my portion of the talk, I built a number of demos as Jupyter (IPython) notebooks. The demos are for toy problems and are written using the excellent Keras library.

The Keras library provides a minimalistic and modular interface over either Theano or Tensorflow backends. Working in Keras is an order of magnitude simpler and more convenient than working with Tensorflow (and I suspect Theano as well, although I haven't done anything significant with Theano yet). The only time you might want to use Tensorflow (or Theano) is when your network needs a component that isn't available in Keras or cannot be composed from things that are available in Keras.

Francois Chollet, creator of Keras, has already blogged about using Keras as a simplified interface to Tensorflow, where he talks about calling Keras from Tensorflow. Although I haven't encountered such a situation yet (very likely because I haven't built very complex/novel models) I think that the opposite direction would be of more interest, i.e., calling a Tensorflow model from Keras. But maybe all I would have to do then is just build a custom layer using the Keras backend abstraction and plain Python. I guess I will cross that bridge when I get to it.

Anyway, I thought the notebooks would be interesting for you here, hence sharing. All of these are on my sujitpal/intro-dl-talk-code project on Github, so you can just download them and run them locally. Most of the complete in a reasonable time on a CPU only system, except the last one, which runs for around 5-6 hours.

  • 01-nonlinearity.ipynb - This notebook is adapted from the blog post Simple end-to-end Tensorflow examples by Jason Baldridge. The idea is to think of a Fully Connected Network (FCN) as something of a golden hammer, showing that it is possible to configure it by increasing its depth and number of hidden units in each layer to classify increasingly complex datasets. The datasets themselves are synthetic and available from scikit-learn.
  • 02-mnist-mlp.ipynb - Using an FCN to classify MNIST digits is almost the "Hello World" example for Deep Learning. The image is flattened from a matrix of size (28,28) to a vector of size (784,) for input to the network. The MNIST data is available from scikit-learn.
  • 03-mnist-cnn.ipynb - The logical next step is to use a Convolutional Neural Network (CNN), where you can exploit the geometry of the image. The code is adapted from the blog post LeNet - Convolutional Neural Network in Python by Adrian Rosenbrock of PyImageSearch. As expected, it performs better than the FCN. The MNIST data is reused from the previous example.
  • 04-umich-sentiment-analysis.ipynb - This notebook uses a Long Short Term Memory (LSTM) network to read sentences and compute a sentiment value between 0 (negative) and 1 (positive) for each sentence. This code is adapted from this Keras example that does sentiment analysis on IMDB dataset. The data for this exercise came from the University of Michigan SI650 in-class contest on Kaggle.
  • 05-alice-rnn-langmodel.ipynb - This notebook trains a character based Recurrent Neural Network (RNN) language model using the text of Alice in Wonderland from the Project Gutenberg website. The idea is to feed it 10 characters and teach it to predict the 11th. The hope is that it can predict words (and hopefully intelligent phrases and sentences). Although it falls short on the latter, after about 50 epochs of training, it does learn to spell. The notebook is inspired by the blog post The Unreasonable Effeciveness of Recurrent Neural Nets by Andrej Karpathy.
  • 06-redrum-mt-lstm.ipynb - This was actually the last assignment for the Deep Learning MOOC on Udacity, although the course expects you to code it in Tensorflow. The network is a sequence-to-sequence network using LSTMs, commonly used in Machine Translation applications. My network is a character based model trained on sequences of 4 words from the text of Alice in Wonderland, and returns 4 word sequences with the characters in each word reversed. The code was adapted from this Keras example on sequence-to-sequence learning.

Thats all I have for today. I hope you find these models useful for your own work. As you can see from the links in the post, Deep Learning is a very active field and there are many people in here doing great work. If you want to learn more, there are some more links in my slides. Although my work now involves some work with Deep Learning, I am still learning (as fast as I can). Now that I have a project set up, I will try to add more models to this set, so I can deepen my own understanding (and possibly use them for demos later).


Be the first to comment. Comments are moderated to prevent spam.