Tuesday, April 5, 2011

Weka on Java through Eclipse - Getting Started

I wanted to use weka in my java code. Weka is a useful toolkit for machine learning. It has all basic classifiers, clustering techniques and many more that can be easily trained and used. I use Eclipse to develop java code. Hence, I wanted to learn how to integrate weka into my code in Eclipse. I had to learn the following:

  • How to make functionalities of weka available in my java code? - I had to find jar file for weka and include it in my project. I found Weka with Java - Getting Started very apt to accomplish this job.

  • How to load a dataset (stored as csv or arff file), how to filter the dataset by providing options to it, how to train a classifier with certain specified options etc? These were very nicely explained in Weka with Java - Essentials with examples.

  • How to save a classifier/cluster/associator model and how to load it for predicting? The model file is usually saved as <name>.model (Eg: J48.model, Naive.model etc) Somehow "-d <model file name>" for saving model or "-l <model file name>" for loading the model does not seem to work. However, the following works (Source):
import java.io.OutputStream;
import java.io.FileOutputStream;
import java.io.ObjectOutputStream;
import java.io.InputStream;
import java.io.FileInputStream;
import java.io.ObjectInputStream;

// Lines of code corresponding to the building of the model

// save the model file
OutputStream os = new FileOutputStream(modelFileName);
ObjectOutputStream objectOutputStream = new ObjectOutputStream(os);
objectOutputStream.writeObject(naiveBayesClassifier);

//read the model file
InputStream is = new FileInputStream(modelFileName);
ObjectInputStream objectInputStream = new ObjectInputStream(is);
naiveBayesClassifier = (NaiveBayes) objectInputStream.readObject();
objectInputStream.close();