0

I have used weka and made a Naive Bayes classifier, by using weka GUI. Then I have saved this model by following this tutorial. Now I want to load this model through Java code but I am unable to find any way to load a saved model using weka.

This is my requirement that I have to made model separately and then use it in a separate program.

If anyone can guide me in this regard I will be thankful to you.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Hammad Hassan
  • 1,192
  • 17
  • 29

1 Answers1

5

You can easily load a saved model in java using this command:

Classifier myCls = (Classifier) weka.core.SerializationHelper.read(pathToModel);

For a complete workflow in Java I wrote the following article in SO Documentation, now copied here:

Text Classification in Weka

Text Classification with LibLinear

  • Create training instances from .arff file

    private static Instances getDataFromFile(String path) throws Exception{
    
        DataSource source = new DataSource(path);
        Instances data = source.getDataSet();
    
        if (data.classIndex() == -1){
            data.setClassIndex(data.numAttributes()-1);
            //last attribute as class index
        }
    
        return data;    
    }
    

Instances trainingData = getDataFromFile(pathToArffFile);
  • Use StringToWordVector to transform your string attributes to number representation:

    • Important features of this filter:

      1. tf-idf representation
      2. stemming
      3. lowercase words
      4. stopwords
      5. n-gram representation*

     

    StringToWordVector() filter = new StringToWordVector();    
    filter.setWordsToKeep(1000000);
    if(useIdf){
        filter.setIDFTransform(true);
    }
    filter.setTFTransform(true);
    filter.setLowerCaseTokens(true);
    filter.setOutputWordCounts(true);
    filter.setMinTermFreq(minTermFreq);
    filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER));
    NGramTokenizer t = new NGramTokenizer();
    t.setNGramMaxSize(maxGrams);
    t.setNGramMinSize(minGrams);    
    filter.setTokenizer(t);     
    WordsFromFile stopwords = new WordsFromFile();
    stopwords.setStopwords(new File("data/stopwords/stopwords.txt"));
    filter.setStopwordsHandler(stopwords);
    if (useStemmer){
        Stemmer s = new /*Iterated*/LovinsStemmer();
        filter.setStemmer(s);
    }
    filter.setInputFormat(trainingData);
    
    • Apply the filter to trainingData: trainingData = Filter.useFilter(trainingData, filter);

    • Create the LibLinear Classifier

      1. SVMType 0 below corresponds to the L2-regularized logistic regression
      2. Set setProbabilityEstimates(true) to print the output probabilities

        Classifier cls = null; LibLINEAR liblinear = new LibLINEAR(); liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE)); liblinear.setProbabilityEstimates(true); // liblinear.setBias(1); // default value cls = liblinear; cls.buildClassifier(trainingData);

    • Save model

      System.out.println("Saving the model..."); ObjectOutputStream oos; oos = new ObjectOutputStream(new FileOutputStream(path+"mymodel.model")); oos.writeObject(cls); oos.flush(); oos.close();

    • Create testing instances from .arff file

      Instances trainingData = getDataFromFile(pathToArffFile);

    • Load classifier

    Classifier myCls = (Classifier) weka.core.SerializationHelper.read(path+"mymodel.model");

    • Use the same StringToWordVector filter as above or create a new one for testingData, but remember to use the trainingData for this command:filter.setInputFormat(trainingData); This will make training and testing instances compatible. Alternatively you could use InputMappedClassifier

    • Apply the filter to testingData: testingData = Filter.useFilter(testingData, filter);

    • Classify!

    1.Get the class value for every instance in the testing set

    for (int j = 0; j < testingData.numInstances(); j++) { double res = myCls.classifyInstance(testingData.get(j)); } res is a double value that corresponds to the nominal class that is defined in .arff file. To get the nominal class use : testintData.classAttribute().value((int)res)


2.Get the probability distribution for every instance

 for (int j = 0; j < testingData.numInstances(); j++) {
    double[] dist = first.distributionForInstance(testInstances.get(j));
 }

dist is a double array that contains the probabilities for every class defined in .arff file

Note. Classifier should support probability distributions and enable them with: myClassifier.setProbabilityEstimates(true);

xro7
  • 729
  • 6
  • 27
  • This code is working fine when I am using it in a simple java project. When I shifted to Java EE and used this code to get the model object, it gave me exception as, cannot assign instance of java.util.ArrayList to field weka.core.Instances.m_Attributes of type weka.core.FastVector in instance of weka.core.Instances It is strange same code is working in simple Java project but problem in Java EE. – Hammad Hassan Mar 08 '17 at 10:20
  • 1
    It is strange indeed. Are you using the same version of Weka? Cause to my best of knowledge FastVector class is deprecated. Maybe this is causing the issue. – xro7 Mar 08 '17 at 10:54
  • Previously I was using weka.jar but it does not have Maven for it but Maven exist for weka-stable 3.6.6 . I am using that and got this exception. To avoid from library confusion I also used previous lib Weka.jar through class path. On that I got ClassNotFound exception. So in JavaEE no matter use which lib got exceptions. – Hammad Hassan Mar 08 '17 at 11:03
  • I think that 3.6.6 weka version causing the exception. ClassNotFound exception is caused because the jar is not linked properly. Try following insructions from here http://stackoverflow.com/questions/7672933/maven-how-to-include-jars-in-eclipse-which-are-not-available-in-repository to link your jar. By the way what version is the weka.jar ? – xro7 Mar 08 '17 at 11:19
  • Once I downloaded it from Weka site. No version is mentioned with its name. – Hammad Hassan Mar 08 '17 at 13:03
  • I found weka.jar is 3.7.12 – Hammad Hassan Mar 09 '17 at 09:27
  • I found maven for weka dev 3.7.12 . That did not helped too. Then I go with latest version of weka-dev maven (3.9.1) and now issue is resolved. Thank a lot for staying with me in this issue. Your talk about version no really helped a lot. – Hammad Hassan Mar 09 '17 at 09:48
  • glad i could help :) – xro7 Mar 09 '17 at 10:02