1

I have been using the java version of libsvm for many data mining problems. However what I noticed is even when we have multicore computer, libsvm uses only one core, it doesn't parallelize the problem. When I searched in FAQs there was a c++ solution[http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f432]. The existing java class looks like this.

@Override
float[] get_Q( int i, int len )
{
    float[][] data = new float[1][];
    int start, j;
    if ( ( start = cache.get_data( i, data, len ) ) < len )
    {
        for ( j = start; j < len; j++ )
        {
            data[0][j] = ( float ) ( y[i] * y[j] * kernel_function( i, j ) );
        }

    }
    return data[0];
}

I used the same concept in java also - changing the for loop of get_Q in class SVC_Q like following.

    @Override
float[] get_Q( int i, int len )
{

    float[][] data = new float[1][];
    int start, j;
    if ( ( start = cache.get_data( i, data, len ) ) < len )
    {
        ExecutorService executorService = Executors.newFixedThreadPool( Runtime.getRuntime()
                .availableProcessors() ); // number of threads

        for ( j = start; j < len; j++ )
        {
            final int count = j;
            executorService.submit( new Runnable()
            {
                @Override
                public void run()
                {
                    data[0][count] = ( float ) ( y[i] * y[count] * kernel_function( i, count ) );
                }
            } );

        }
        executorService.shutdown();

    }
    return data[0];
}

Even though after the change now it uses all cores in my machine the results were decreasing. The percentage of correctly classified instances for a new test set went down from 78% to 58%. And the training time didn't reduce either. So obviously I am not doing it right. Is there a proper way to parallelize libsvm? What is the mistake I am doing in my code?

Tharindu
  • 310
  • 4
  • 14

1 Answers1

-1

Please avoid trying to parallelize any code if you do not know how to write multi-threaded/parallel code.

In this particular case, you need to wait for the executor to finish all of its jobs. before you return the result.

However, that does not mean the kernel_function method is thread safe.

Community
  • 1
  • 1
Raff.Edward
  • 6,404
  • 24
  • 34
  • Thank you, this solved the accuracy issue, even though the time taken to execute remains the same, so I guess kernel_function is not thread safe. But I wonder why the FAQ had the solution in c++ like this, #pragma omp parallel for private(j) schedule(guided) for(j=start;j – Tharindu Sep 22 '15 at 04:23
  • 1
    What the omp pgrama does and what you wrote are not 100% equivalent. Your code here has significant overhead, starting up a job for each element. This is why I recommended you not attempt to parallelize code if you do not know anything about writing multi-threaded code. I would recommend you read this ( http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/ref=pd_sim_14_1?ie=UTF8&refRID=08GYEMBGRC75QHPEZQ4D&dpID=51YGGeOM%2B-L&dpSrc=sims&preST=_AC_UL160_SR104%2C160_ ) before attempting these kinds of changes on your own. – Raff.Edward Sep 22 '15 at 17:59
  • Being thread-safe has nothing to do with speedup, it is whether or not a method/variable is *safe* to be used from more than one thread at the same time. – Raff.Edward Sep 22 '15 at 18:00
  • So what will be the equivalent java code for the C++ code - #pragma omp parallel for private(j) schedule(guided) for(j=start;j – Tharindu Sep 24 '15 at 05:03
  • 5
    @raff Please don't discourage beginners from learning parallel computing. Your answer recommends that nobody attempt to write parallel code without already knowing how to do it, which effectively blocks the easiest route to learning parallel computing. Yeah, they might write some horrible code when beginning, but it would be far more constructive to direct them to appropriate resources or just answer their question directly instead of chastising them for trying to learn something new. – Brendan Wood Oct 19 '16 at 04:03
  • I don't think my answer discourages learning that, but your welcome to that thought. I only discouraged trying to parallelize other's code if you don't know how to do it. I think it is fairly obvious that doesn't apply if you are performing an exercise in learning how to do so. It was clear to me that the purpose was not learning in this case, and thus a bad avenue for the OP to pursue without prior experience. – Raff.Edward Oct 19 '16 at 17:44