0

I have asked a few broad questions about the operations of Weka and C# as well as WekaSharp, so I thought I would try to ask a more focused question to try to progress further on my own. As an example given from the weka site on executing weka from C# I was using I would like to run part of the calculation using parallel operations but am not sure how to code it here is the raw code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using weka.classifiers.meta;
using weka.classifiers.functions;
using weka.core;
using java.io;
using weka.clusterers;
using System.Diagnostics;
using System.Threading;

// From http://weka.wikispaces.com/IKVM+with+Weka+tutorial

class MainClass
{
    public static void Main(string[] args)
    {
        System.Console.WriteLine("J48 in C#");
        classifyTest();
    }

    const int percentSplit = 66;
    public static void classifyTest()
    {
        try
        {
            weka.core.Instances insts = new weka.core.Instances(new java.io.FileReader(@"C:\Users\Deines\Documents\School\Software\WekaSharp2012\data\iris.arff"));
            insts.setClassIndex(insts.numAttributes() - 1);

            weka.classifiers.Classifier cl = new weka.classifiers.trees.J48();
            System.Console.WriteLine("Performing " + percentSplit + "% split evaluation.");

            //randomize the order of the instances in the dataset.
            weka.filters.Filter myRandom = new weka.filters.unsupervised.instance.Randomize();
            myRandom.setInputFormat(insts);
            insts = weka.filters.Filter.useFilter(insts, myRandom);

            int trainSize = insts.numInstances() * percentSplit / 100;
            int testSize = insts.numInstances() - trainSize;
            weka.core.Instances train = new weka.core.Instances(insts, 0, trainSize);

            cl.buildClassifier(train);
            int numCorrect = 0;
            for (int i = trainSize; i < insts.numInstances(); i++)
            {
                weka.core.Instance currentInst = insts.instance(i);
                double predictedClass = cl.classifyInstance(currentInst);
                if (predictedClass == insts.instance(i).classValue())
                    numCorrect++;
            }
            System.Console.WriteLine(numCorrect + " out of " + testSize + " correct (" +
                       (double)((double)numCorrect / (double)testSize * 100.0) + "%)");
        }
        catch (java.lang.Exception ex)
        {
            ex.printStackTrace();
        }
    }

}

I would like to run :

        for (int i = trainSize; i < insts.numInstances(); i++)
        {
            weka.core.Instance currentInst = insts.instance(i);
            double predictedClass = cl.classifyInstance(currentInst);
            if (predictedClass == insts.instance(i).classValue())
                numCorrect++;
        }

both sequentially and with concurrency in order to compaire the rates. I know the command is System.Linq.ParallelExecutionMode() , but I am not sure how to apply it in this case. Thank you very much.

Adriano Repetti
  • 65,416
  • 20
  • 137
  • 208

1 Answers1

2

Why not System.Threading.Tasks.Parallel.For instead?

Parallel.For(trainSize, inst.numInstances(), i => 
{
    weka.core.Instance currentInst = insts.instance(i);
    double predictedClass = cl.classifyInstance(currentInst);
    if (predictedClass == insts.instance(i).classValue())
        Interlocked.Increment(ref numCorrect);
});

Please not I didn't execute this code so you may need to add some synchronization code (monitors or locks) to access some shared data.

Adriano Repetti
  • 65,416
  • 20
  • 137
  • 208
  • Thank you I like this method better, this is why you are the expert :-). I will give this a go and should be interesting, will note if I needed to add some sync code here for future reading. – RedMassiveStar Dec 02 '13 at 23:20
  • You are humble as well as helpful :-) I did not need to add any sync code and it executed well as showed some improvement to efficiency over a big data set. – RedMassiveStar Dec 03 '13 at 04:47
  • @RedMassiveStar yes very first step of parallel programming is...profiling! If overhead (for parallel execution) is too high compared to calculations then improvements are smaller. – Adriano Repetti Dec 03 '13 at 07:35
  • @RedMassiveStar If you have time I would try to _unroll_ parallel cycle (perform at least two calculations inside loop, for i and i + 1) or a [partitioner](http://msdn.microsoft.com/en-us/library/ff963547.aspx). Second step would be to make more things parallel (but you should profile which functions take time) using [futures](http://msdn.microsoft.com/en-us/library/ff963556.aspx) and [async](http://msdn.microsoft.com/en-us/library/hh191443.aspx). – Adriano Repetti Dec 03 '13 at 07:38