1

I'm writing a Monte Carlo simulation in Java that involves generating a lot of random integers. My thinking was that native code would be faster for random number generation, so I should write the code in C++ and return the output via JNI. But when I wrote the same method in C++, it actually takes longer to execute than the Java version. Here are the code samples:

Random rand = new Random();
int threshold = 5;
int[] composition = {10, 10, 10, 10, 10};
for (int j = 0; j < 100000000; j++) {
    rand.setSeed(System.nanoTime());
    double sum = 0;
    for (int i = 0; i < composition[0]; i++) sum += carbon(rand);
    for (int i = 0; i < composition[1]; i++) sum += hydrogen(rand);
    for (int i = 0; i < composition[2]; i++) sum += nitrogen(rand);
    for (int i = 0; i < composition[3]; i++) sum += oxygen(rand);
    for (int i = 0; i < composition[4]; i++) sum += sulfur(rand);
    if (sum < threshold) {}//execute some code
    else {}//execute some other code
}

And the equivalent code in C++:

int threshold = 5;
int composition [5] = {10, 10, 10, 10, 10};
for (int i = 0; i < 100000000; i++)
{
    srand(time(0));
    double sum = 0;
    for (int i = 0; i < composition[0]; i++) sum += carbon();
    for (int i = 0; i < composition[1]; i++) sum += hydrogen();
    for (int i = 0; i < composition[2]; i++) sum += nitrogen();
    for (int i = 0; i < composition[3]; i++) sum += oxygen();
    for (int i = 0; i < composition[4]; i++) sum += sulfur();
    if (sum > threshold) {}
    else {}
}

All of the element methods (carbon, hydrogen, etc) just generate a random number and return a double.

Runtimes are 77.471 sec for the Java code, and 121.777 sec for C++.

Admittedly I'm not very experienced in C++ so it's possible that the cause is just badly written code.

Beryllium
  • 12,808
  • 10
  • 56
  • 86
Sevy
  • 688
  • 4
  • 11
  • Did you have compiler optimisations enabled for the C++ version? – Oliver Charlesworth Jul 25 '13 at 17:44
  • 1
    The overhead of JNI is likely to exceed any performance benefit from trying to use C++ in Java. – Louis Wasserman Jul 25 '13 at 17:44
  • @OliCharlesworth I've been using the gcc compiler that comes with CodeBlocks IDE – Sevy Jul 25 '13 at 17:46
  • What's the question? Are you looking for an explanation or a way to make the C++ code faster? – DannyMo Jul 25 '13 at 17:49
  • Explanation - I thought that c++ code would be faster since it doesn't have to work through the JVM. Is it because of bad code? Lack of compiler optimizations? Or is random number generation inherently similar in speed between the two languages? – Sevy Jul 25 '13 at 17:56
  • It's not clear whether your comparison is "Java vs. C++" or "Java vs. Java/JNI/C++". Have you considered these [benchmark recommendations](http://stackoverflow.com/a/513259/2390083)? – Beryllium Jul 25 '13 at 18:02
  • Sorry for the lack of clarity. My original intent was to run the simulation in c++, and return the values to Java through JNI. But, when I started working on the c++ code, I noticed that even without involving JNI at all, the c++ code is significantly slower. My question: is this because of bad c++ code, and it can be made to run faster? Or is it the nature of c++ random number generation that it's not much better than Java? – Sevy Jul 25 '13 at 18:17
  • Do you actually mean to call `srand` a gazillion times? That seems pretty bad choice to me... (And it is far from a lightweight operation in the GNU standard C library - other standard libraries may differ). – Mats Petersson Jul 25 '13 at 18:42
  • 1
    OK, using `gcc -O3`, I could reproduce your results. I have no definitive sources, but I suspect that either the methods of `java.util.Random` are intrinsic operations in the JVM or the Java random number generation is different (= faster). On the other hand, `gcc` has only one shot to optimize whereas the JIT has much more time to optimize (in iterations). – Beryllium Jul 25 '13 at 18:48
  • How much shorter is it without the call to `srand`? – Mats Petersson Jul 25 '13 at 18:56
  • When I move srand outside the loop, runtime decreases to 99.675 sec. I wanted to keep setting the seed so the results would not be deterministic (closer to being truly random) – Sevy Jul 25 '13 at 19:12
  • I notice the Java version has the condition `if (sum < threshold)` and the C++ version has the condition `if (sum > threshold)`. – bames53 Jul 25 '13 at 23:17

2 Answers2

1

Java (actually the JIT) is generally very good at detecting code which doesn't do anything useful. This is because the JIT can obtain information at runtime a static compiler cannot determine. For code which can be optimised away, Java can actually be faster than C++. In general however, a well tuned C++ program is faster than one in Java.

In short, given any amount of time, C++ will be faster for a well understand, well tuned program. However, given limited resources, changing requirements and teams of mixed ability Java can often outperform C++ by a significant margin.

All that said, it could be that the random in C++ is better, but more expensive.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
1

I suspect that the performance issue is in the bodies of your carbon(), hydrogen(), nitrogen(), oxygen(), and sulfur() functions. You should show how they produce the random data.

Or it could be in the if (sum < threshold) {} else {} code.

I wanted to keep setting the seed so the results would not be deterministic (closer to being truly random)

Since you're using the result of time(0) as a seed you're not getting particularly random results either way.

Instead of using srand() and rand() you should take a look at the <random> library and choose an engine with the performance/quality characteristics that meed your needs. If your implementation supports it you can even get non-deterministic random data from std::random_device (either to generate seeds or as an engine).

Additionally <random> provides pre-made distributions such as std::uniform_real_distribution<double> which is likely to be better than the average programmer's method of manually computing the distribution you want from the results of rand().


Okay, here's how you can eliminate the inner loops from your code and drastically speed it up (In Java or C++).

Your code:

double carbon() {
  if (rand() % 10000 < 107)
    return 13.0033548378;
  else
    return 12.0;
}

picks one of two values with a particular probability. Presumably you intended the first value to be picked about 107 times out of 10000 (although using % with rand() doesn't quite give you that). When you run this in a loop and sum the results as in:

for (int i = 0; i < composition[0]; i++) sum += carbon();

you'll essentially get sum += X*13.0033548378 + Y*12.0; where X is the number of times the random number stays under the threshold and Y is (trials-X). It just so happens that you can simulate running a bunch of trials and calculating the number of successes using a binomial distribution, and <random> happens to provide a binomial distribution.

Given a function sum_trials()

std::minstd_rand0 eng; // global random engine

double sum_trials(int trials, double probability, double A, double B) {
  std::binomial_distribution<> dist(trials, probability);
  int successes = dist(eng);
  return successes*A + (trials-successes)*B;
}

You can replace your carbon() loop:

sum += sum_trials(composition[0], 107.0/10000.0, 13.003354378, 12.0); // carbon trials

I don't have the actual values you're using, but your whole loop will look something like:

  for (int i = 0; i < 100000000; i++) {
     double sum = 0;
     sum += sum_trials(composition[0], 107.0/10000.0, 13.003354378, 12.0); // carbon trials
     sum += sum_trials(composition[1], 107.0/10000.0, 13.003354378, 12.0); // hydrogen trials
     sum += sum_trials(composition[2], 107.0/10000.0, 13.003354378, 12.0); // nitrogen trials
     sum += sum_trials(composition[3], 107.0/10000.0, 13.003354378, 12.0); // oxygen trials
     sum += sum_trials(composition[4], 107.0/10000.0, 13.003354378, 12.0); // sulfur trials

     if (sum > threshold) {
     } else {
     }
   }

Now one thing to note is that inside the function we're constructing distributions over and over with the same data. We can extract that by replacing the function sum_trials() with a function object, which we construct with the appropriate data once before the loop, and then just use the functor repeatedly:

struct sum_trials {
  std::binomial_distribution<> dist;
  double A; double B; int trials;

  sum_trials(int t, double p, double a, double b) : dist{t, p}, A{a}, B{b}, trials{t} {}

  double operator() () {
    int successes = dist(eng);
    return successes * A + (trials - successes) * B;
  }
};

int main() {
  int threshold = 5;
  int composition[5] = { 10, 10, 10, 10, 10 };

  sum_trials carbon   = { composition[0], 107.0/10000.0, 13.003354378, 12.0};
  sum_trials hydrogen = { composition[1], 107.0/10000.0, 13.003354378, 12.0};
  sum_trials nitrogen = { composition[2], 107.0/10000.0, 13.003354378, 12.0};
  sum_trials oxygen   = { composition[3], 107.0/10000.0, 13.003354378, 12.0};
  sum_trials sulfur   = { composition[4], 107.0/10000.0, 13.003354378, 12.0};


  for (int i = 0; i < 100000000; i++) {
     double sum = 0;

     sum += carbon();
     sum += hydrogen();
     sum += nitrogen();
     sum += oxygen();
     sum += sulfur();

     if (sum > threshold) {
     } else {
     }
   }
}

The original version of the code took my system about one minute 30 seconds. The last version here takes 11 seconds.


Here's a functor to generate the oxygen sums using two binomial_distributions. Maybe one of the other distributions can do this in one shot but I don't know.

struct sum_trials2 {
  std::binomial_distribution<> d1;
  std::binomial_distribution<> d2;
  double A; double B; double C;
  int trials;
  double probabilty2;

  sum_trials2(int t, double p1, double p2, double a, double b, double c)
    : d1{t, p1}, A{a}, B{b}, C{c}, trials{t}, probability2{p2} {}

  double operator() () {
    int X = d1(eng);
    d2.param(std::binomial_distribution<>{trials-X, p2}.param());
    int Y = d2(eng);

    return X*A + Y*B + (trials-X-Y)*C;
  }
};

sum_trials2 oxygen{composition[3], 17.0/1000.0, (47.0-17.0)/(1000.0-17.0), 17.9999, 16.999, 15.999};

You can further speed this up if you can just calculate the probability that the sum is under your threshold:

int main() {
  std::minstd_rand0 eng;
  std::bernoulli_distribution dist(probability_sum_is_over_threshold);

  for (int i=0; i< 100000000; ++i) {
    if (dist(eng)) {
    } else {
    }
  }
}

Unless the values for the other elements can be negative then the probability that the sum is greater than five is 100%. In that case you don't even need to generate random data; execute the 'if' branch of your code 100,000,000 times.

int main() {
  for (int i=0; i< 100000000; ++i) {
    //execute some code
  }
}
bames53
  • 86,085
  • 15
  • 179
  • 244
  • The carbon, hydrogen, etc methods are pretty simple, they are all similar to this: private double carbon () { if (rand()%10000 < 107) return 13.0033548378; else return 12.0; } And the if (sum – Sevy Jul 25 '13 at 21:06
  • @user2004343 I've updated my answer on the basis of that `carbon()` function. – bames53 Jul 25 '13 at 23:17
  • I like the idea of using the binomial distribution to speed up the analysis, I hadn't thought of that. The only problem is that for oxygen and sulfur methods there are actually three possible masses - so the method looks more like this: double oxygen() { outcome = rand()%1000; if (outcome<17) return 17.9999; else if (outcome < 47) return 16.999; else return 15.999} – Sevy Jul 26 '13 at 13:46
  • An update, in case you're interested: I did get it working with the methods that involve three different return values. I first used the binomial distribution to get the number of occurrences of an isotope of either type. Then, out of those limited number of occurrences, I randomly generated numbers and used these to determine if the isotope would be O17 or O18. It sped up my code a lot, thanks for your help – Sevy Jul 26 '13 at 20:39