Sunday, October 9, 2016

TensorFlow Tutorial

Overview

Here, I would like to share my experience with using TensorFlow. In the process, you will learn how to set it up and use it for your projects.

Setting it up

This is the official link and the most reliable one. The only problem with it is that you cannot distinguish which ones are the main headings and what are sub-headings unless you pay attention to the side bar. This leads to confusion of what to install and what not to install. Let me summarize it for you here:

Main point to note is that TensorFlow needs python. In addition, if you are exploring GPU version, it needs Cuda Toolkit and cuDNN. There are two main ways to install Tensorflow:
  1. From binary packages
  2. From github source

First option

First option is the easiest and I feel one should prefer this unless you are using different version of the packages on which the TensorFlow depends on. Even under the first option, there are multiple ways to accomplish this:
  1. Pip install
  2. Virtualenv install
  3. Anaconda install
  4. Docker install
As indicated on the official link, pip install might affect existing Python programs on the machine. The others do not affect. Among the options, I found Anaconda install to be the easiest. Installing Anaconda is a piece of cake as per my experience. Here are the details on how to install it. Once you have that setup, you can install TensorFlow also very quickly.

If you are okay with CPU version, you are done and are ready to start using TensorFlow.

In case you want to use GPU version (supported at present only on Linux), you need to install right versions of Cuda Toolkit as well as cuDNN. Then, export the relevant variables. You are done. You can find the exact details on the official link. Getting Cuda binary would require you to register on their site.

Second option

I was not successful when I tired to setup TensorFlow with this option. I had issues with Bazel. In case you want to explore this one, following the official link might help. I tried the steps mentioned on few blogs and that didn't work well for me.

Learning a DNN

Let us start simple. I built a simple logistic regression model for Iris dataset following the steps indicated in the this link. Iris data can be obtained from UCI. Replace 

ipd = pd.read_csv("iris.csv")

with

ipd = pd.read_csv("iris.data.txt",names=['Sepal Length','Sepal Width','Petal Length','Petal Width')

Once you run the script end-to-end, you can add a hidden layer and repeat the process to check if the hidden layer helped. When I added a hidden layer with 5 nodes, it did not result in any improvement. This makes sense as the number of features and data points are small.

Next, I tried with Adult dataset, which is also part of UCI repository. Here is the python notebook for it.