7

I am trying to compare performance between parallel streams in Java 8 and PLINQ (C#/.Net 4.5.1).

Here is the result I get on my machine ( System Manufacturer Dell Inc. System Model Precision M4700 Processor Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz, 2701 Mhz, 4 Core(s), 8 Logical Processor(s) Installed Physical Memory (RAM) 16.0 GB OS Name Microsoft Windows 7 Enterprise Version 6.1.7601 Service Pack 1 Build 7601)

C# .Net 4.5.1 (X64-release)

Serial:

470.7784, 491.4226, 502.4643, 481.7507, 464.1156, 463.0088, 546.149, 481.2942, 502.414, 483.1166

Average: 490.6373

Parallel:

158.6935, 133.4113, 217.4304, 182.3404, 184.188, 128.5767, 160.352, 277.2829, 127.6818, 213.6832

Average: 180.5496

Java 8 (X64)

Serial:

471.911822, 333.843924, 324.914299, 325.215631, 325.208402, 324.872828, 324.888046, 325.53066, 325.765791, 325.935861

Average:326.241715

Parallel:

212.09323, 73.969783, 68.015431, 66.246628, 66.15912, 66.185373, 80.120837, 75.813539, 70.085948, 66.360769

Average:70.3286

It looks like PLINQ does not scale across the CPU cores. I am wondering if I miss something.

Here is the code for C#:

class Program
{
    static void Main(string[] args)
    {
        var NUMBER_OF_RUNS = 10;
        var size = 10000000;
        var vals = new double[size];

    var rnd = new Random();
    for (int i = 0; i < size; i++)
    {
        vals[i] = rnd.NextDouble();
    }

    var avg = 0.0;
    Console.WriteLine("Serial:");
    for (int i = 0; i < NUMBER_OF_RUNS; i++)
    {
        var watch = Stopwatch.StartNew();
        var res = vals.Select(v => Math.Sin(v)).ToArray();
        var elapsed = watch.Elapsed.TotalMilliseconds;
        Console.Write(elapsed + ", ");

        if (i > 0)
            avg += elapsed;
    }
    Console.Write("\nAverage: " + (avg / (NUMBER_OF_RUNS - 1)));

    avg = 0.0;
    Console.WriteLine("\n\nParallel:");
    for (int i = 0; i < NUMBER_OF_RUNS; i++)
    {
        var watch = Stopwatch.StartNew();
        var res = vals.AsParallel().Select(v => Math.Sin(v)).ToArray();
        var elapsed = watch.Elapsed.TotalMilliseconds;
        Console.Write(elapsed + ", ");

        if (i > 0)
            avg += elapsed;
    }
    Console.Write("\nAverage: " + (avg / (NUMBER_OF_RUNS - 1)));
}
}

Here is the code for Java:

import java.util.Arrays;
import java.util.Random;
import java.util.stream.DoubleStream;

public class Main {
    private static final Random rand = new Random();
    private static final int MIN = 1;
    private static final int MAX = 140;
    private static final int POPULATION_SIZE = 10_000_000;
    public static final int NUMBER_OF_RUNS = 10;
public static void main(String[] args) throws InterruptedException {
    Random rnd = new Random();
    double[] vals1 = DoubleStream.generate(rnd::nextDouble).limit(POPULATION_SIZE).toArray();

    double avg = 0.0;
    System.out.println("Serial:");
    for (int i = 0; i < NUMBER_OF_RUNS; i++)
    {
        long start = System.nanoTime();
        double[] res = Arrays.stream(vals1).map(Math::sin).toArray();
        double duration = (System.nanoTime() - start) / 1_000_000.0;
        System.out.print(duration + ", " );

        if (i > 0)
            avg += duration;
    }
    System.out.println("\nAverage:" + (avg / (NUMBER_OF_RUNS - 1)));

    avg = 0.0;
    System.out.println("\n\nParallel:");
    for (int i = 0; i < NUMBER_OF_RUNS; i++)
    {
        long start = System.nanoTime();
        double[] res = Arrays.stream(vals1).parallel().map(Math::sin).toArray();
        double duration = (System.nanoTime() - start) / 1_000_000.0;
        System.out.print(duration + ", " );

        if (i > 0)
            avg += duration;            
    }
    System.out.println("\nAverage:" + (avg / (NUMBER_OF_RUNS - 1)));
}

}

assylias
  • 321,522
  • 82
  • 660
  • 783
Bijan
  • 241
  • 2
  • 7
  • 1
    Your java benchmark would be more precise if you placed the code of each scenario (sequential vs. parallel) in its own method and ran the method a bit more to make sure it gets properly compiled. You should then check each run timing and reject the times before compilation. Or even better, use a proper [benchmarking framework](http://openjdk.java.net/projects/code-tools/jmh/). A good read: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java. I suspect the same observation applies to your C# benchmark. – assylias May 20 '14 at 22:38
  • 1
    "It looks like PLINQ does not scale across the CPU cores." - Based on what? – Preston Guillot May 20 '14 at 22:39
  • @PrestonGuillot Based on the ratio between serial and parallel timing, which was far worse than in the case of Java. – Marko Topolnik May 21 '14 at 08:26
  • 1
    That means that the run time of *this* doesn't scale linearly with respect to number of CPU cores - it's not a sufficient indictment of anything else. – Preston Guillot May 21 '14 at 14:51
  • This is not a trial and the question is not an indictment. You seem to be taking this too personally. OP has demonstrated an "embarassingly" parallelizable case for which PLINQ's performance does not scale with CPU cores. The simplicity of the case hints to a quite wide applicabilty of the results. – Marko Topolnik May 22 '14 at 08:40
  • Just out of curiosity, how does Parallel.ForEach fare? – Linkgoron May 24 '14 at 17:51
  • 1
    @user3658553 Why are you calling .ToArray() after your .Select()? – Kai Eichinger May 26 '14 at 16:57
  • 1
    Java always was a bit faster, the difference diverges even more in jdk8 and probably 9. HotSpot is a beast, compared to stock .NET runtime. The Streams implementation is very efficient and wisely developed. Also, virtual method calls are much faster in stock Java due to HotSpot optimizations. No magic here. – Kr0e May 01 '15 at 10:05
  • Benchmarking PLINQ and Streams is like benchmarking XMLDOM. If you cared about performance you would not be using any of these things. C#'s performance advantage comes from its ability to perform advanced stack allocation, its ability to call into C libraries with no overhead, and the support for direct memory access using pointers that Java will never, by design, be able to match. Java performance is essentially for stupid people. – hoodaticus Sep 28 '16 at 15:44

1 Answers1

2

Both runtimes make a decision about how many threads to use in order to complete the parallel operation. That is a non-trivial task that can take many factors into account, including the degree to which the task is CPU bound, the estimated time to complete the task, etc.

Each runtime is different decisions about how many threads to use to resolve the request. Neither decision is obviously right or wrong in terms of system-wide scheduling, but the Java strategy performs the benchmark better (and leaves fewer CPU resources available for other tasks on the system).

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • 1
    By default Java's parallel streams use a number of threads equal to the number of processors (minus one). – assylias May 20 '14 at 22:45
  • The CPU resources needed to complete a task are in fact constant. The only tradeoff is between using them intensely for a short period or less intensely for longer. – Marko Topolnik May 22 '14 at 08:43
  • 3
    @assylias Actually, the "minus one" part is not true: the CommonPool is sized with `availableProcessors()-1`, but the parallel stream uses the *calling* thread in addition to all the pool's threads. This is a natural choice for synchronous operations, which streams are ultimately about. – Marko Topolnik May 22 '14 at 08:45