Saturday, August 29, 2015

Python Tutorial

Motivation

I found a couple of tutorials on Machine Learning techniques I was interested in, but were implemented in Python, syntax of which was new to me. Also, Scikit and Spark use python and have become popular tools now-a-days for Machine Learning. So, I thought why not learn a bit of Python...Thus, my exploration began.

Setting up Python

First thing I noticed about python is that scripts coded with newer version do not run with older python versions though you do not use any new feature in the script. So, you need to get the right version.  I will be using Python 2.7.10 that comes as part of Anaconda for all my explorations. Anaconda comes with many useful python packages along with Python distribution and is very easy to install.
  1. Download the .sh file
  2. Run bash <.sh file downloaded>
  3. Accept the Terms and Conditions, set path for installation and you are done.
Once you have it installed,  you can enter the python prompt by running <Path where anaconda is installed ending in /anaconda>/bin/python2.7

You can exit from the python prompt using quit() or  Ctrl-D

Unique to Python

In python, 
  1. Declaration of variables is simpler and one need not mention the type of the variable
  2. : along with Indentation (space or tab) is used for scoping (In other languages like Java or C, curly braces is used for scoping and indentation is used only to enhance the readability of the code)
  3.  Lines need not end in ; unlike in other languages
  4. # is used for single line comments and """ (Three ") at the start and end for multiple line comments
  5. One can create two types of objects: Mutable and Immutable. Value in Immutable object cannot be changed once created.

Let us compare declaration of an array and printing its values using for loop to understand the difference between Java and Python way of coding. In Java (Indentation presence or absence doesn't matter):

int[] a = new int[5]{1,2,3,4,5};
System.out.println("Start");
for(int i=0;i<5;i++){
System.out.println("Value of a["+i+"] is "+a[i]);
}
System.out.println("End");

Coding same task in Python (Doesn't require variables to be declared. But, require typecast when printing. Indentation decides the scope of for loop):

a = [1,2,3,4,5] 
print("Start") 
for i in range(0, 4): 
     print("Value of a["+str(i)+"] is "+str(a[i])) 
print("End")


Another way of printing in Python: 
 
for i in range(0, 4): 
     print "Value of a[%d] is %d" % (i,a[i])

When first line of for loop is typed and enter key is clicked, one sees ... To enter the loop, one needs to introduce indentation via space or tab and then, type the line that is part of the loop. To exit the loop and execute it, one needs to click enter twice.

Handling Arrays (Lists) and Tuples

Since arrays are generally used, let us look at how to create an array and access elements in the array. In python, arrays are referred as lists and the indices start from 0.

Defining at once: a = [1,2,3,4,5] 

Starting with an empty array and filling it up:

a = []
for i in range(0, 4):
     a.append(i+1)

Another way of creating array with for loop: a = [i+1 for i in range(5)] # The lower limit of range is 0 by default

Creating an array with same value, say 10, in each slot: a = [10]*5

Size of an array: len(a)
Accessing element at a particular index i: a[i]

Accessing all elements from index 2 till the end:  
for e in a[2:]:
     print e 

Accessing all elements, except last 3 elements:  
for e in a[:-3]:
     print e
  
Concatenate two arrays a and b: a + b
 
While list or array is a mutable object, tuple is an immutable object and use parenthesis instead of square brackets. Tuples can only be defined at once. However, accessing elements in tuple is similar to arrays.

Accessing corresponding elements in two arrays (It gets corresponding elements as long as the shortest of the two arrays is exhausted):
for e,f in zip(a,b):
     print e,f #Prints e and f values with space in between
  

Handling 2D Matrices (List of Lists)

Defining at once: m = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]

Starting with an empty matrix and filling it up:

m = []
for row in range(5):
      a = []
      for col in range(4):
           a.append(col+row*4)
      m.append(a)

Creating a matrix of 5X4 with values from 0 to 19: m = [[col+row*4 for col in range(4)] for row in range(5)]

Creating a matrix of 5X4 with all elements having 0: m = 5*[4*[0]]

Number of rows: len(m) 

Number of rows: len(m[0]) 
Accessing i-th row and j-th column of a matrix: m[i][j]
 
Accessing i-th row of a matrix: m[i]

Accessing j-th column of a matrix: zip(*m)[j] #Need to transpose the matrix to access the column as an array

Concatenate or stack two matrices m and n (Need not have same number of columns): m + n #5X4 matrix concatenated with 3X2 matrix yields matrix with 5+3=8 rows with first 5 rows having 4 entries and next 3 rows having 2 entries

Accessing corresponding elements in two matrices (It gets corresponding elements as long as the smaller of the two matrices is exhausted):
for e,f in zip(m,n):
     print e,f #Prints e and f values with space in between 

Handling Dictionaries

Defining at once: d = {'name':'XYZ','age':25} #Stored with key sorted alphabetically

Starting with an empty dictionary and filling it up:

d = {}
keys = ['name','age']
values = ['XYZ',25]
for i in range(2):
     d[keys[i]] = values[i]

Using two parallel lists, keys and values, to get dictionary at once: d = dict(zip(keys,values))  

Accessing value corresponding to a key in the dictionary: d[key]

Accessing value corresponding to a key in the dictionary safely (without producing error if key is not present in the dictionary): d.get(key)

Concatenate two dictionaries d and e: dict(d.items() + e.items())

Clear dictionary: d.clear()

Delete an entry in the dictionary with specified key: del d[key]

Delete entire dictionary: del d

More possible operations on dictionaries can be found in this link.