Speeding up Monte Carlo Pi

Question

I have written a C++-program for calculating pi via the "throw random points into a quarter of a circle and count them etc.". Now my program is a bit slow in my opinion, and I have thought about some improvements to speed it up (source code is below).
My first idea is to make it multithreaded using OpenMP, i.e. split the code between (I) and (II) up into several threads so that I have for example nearly ten times more rounds without having to wait longer (on a octacore system).
Another idea I had was to use global variables and to use pointer, so that I just have to copy pointer and not tuples of integer. Drawback is (idk)?
So, what else can I do to speed the program up? I am working mainly with windows, but I can use Unix/Linux too.
Thank you very much!

Code Section:

    #include <cstdlib>
    #include <iostream>
    #include <tuple>
    #include <math.h>
    #include <time.h>
    #include <omp.h>
    #include <sys/time.h>

    #define RAND_MAX 32000
    #define LOOPS 1000000

    inline std::tuple<int, int> Throw_points(void)
    {

        int i = 0, j = 0;
        i = rand() % 1000;
        j = rand() % 1000;
        return std::make_tuple(i, j);
    }

    inline bool is_in_circle(std::tuple<int, int> point)
    {
        if ((pow(std::get<0>(point), 2) + pow(std::get<1>(point), 2)) <= pow(1000, 2))
            return true;
        else
            return false;
    }

    inline double pi(void)
    {
        srand(time(NULL));
        long long int in_circle = 0;
        long long int out_circle = 0;
        for (int i = 0; i < LOOPS; i++)
        {
            if (is_in_circle(Throw_points()))
                in_circle++;
            out_circle++;
        }
        return double(in_circle) / double(out_circle) * 4;
    }

Call via pi()

It is a slow way to compute `π`; IIRC my math courses, precision is `1/sqrt(LOOPS)` (but I could be wrong). So read some math books to compute `π` faster! And read wikipage on [approximations of π](http://en.wikipedia.org/wiki/Approximations_of_%CF%80) — Basile Starynkevitch, Nov 11 '13 at 17:55
Won't help a lot, but you should cache the result of `pow(1000, 2)`, or at last replace it with `1000*1000`, which is a lot faster then `pow` and the compiler will cache the result for you. — Vinícius Gobbo A. de Oliveira, Nov 11 '13 at 17:57
@BasileStarynkevitch: I have to use this method, otherwise I would have already switched... — arc_lupus, Nov 11 '13 at 17:58
Then use [openmp](http://en.wikipedia.org/wiki/Openmp) `#pragma`-s — Basile Starynkevitch, Nov 11 '13 at 18:00
Also, add a & to the function parameter of is_in_circle. You keep copying your tuples now, whereas you could pass it in by reference. The comment by @ViníciusGobboA.deOliveira about omitting pow(...) also holds for the other squaring operations: I expect that `std::get<0>(point) * std::get<0>(point)` is much faster than `pow(std::get<0>(point), 2)`. — CompuChip, Nov 11 '13 at 18:00
You may want to check this entry: https://gist.github.com/skeeto/212715 — OnoSendai, Nov 11 '13 at 18:00
As a suggestion, change the number generator, standard `rand` function is fast but suffers from some distribution problems (cycle loop, etc) and you may end up repeating calculations — higuaro, Nov 11 '13 at 18:01

score 1 · Accepted Answer · edited May 23 '17 at 10:25

I just played around with this a bit. Actually all of the suggestions in the comments to the original post (including my own) made virtually no difference at all.

However, getting rid of the tuple

inline void Throw_points(int&i, int&j)
{
    i = rand() % 1000;
    j = rand() % 1000;
}

inline bool is_in_circle(int i, int j)
{
    return i*i + j*j < 1000000;        
}

sped up the program by a factor of 5.

I used the boost::progress_timer solution from here, by the way: How to get the time elapsed running a function in C++

score 1 · Answer 2 · answered Nov 11 '13 at 18:23

1

Observation on perf. Use a profiling tool; this tells you where your code is spending its time. Generally this is always surprise.

if you are in gcc land use gprof

answered Nov 11 '13 at 18:23

pm100

48,078
23
82
145

Mike Seymour · Answer 3 · 2013-11-11T18:16:26.303

0

A few random observations:

Squaring by multiplication is probably faster than calling pow. In particular, you don't want to calculate the constant pow(1000,2) each time.
int counters might be faster than long long - you're already constraining the number of loops to be representable by int.
Passing by reference might be faster. Or it might be slower, since the type is small. Or it might make no difference, since the function should be inlined.
if (X) return true; else return false; rather than return X; is weird, but probably doesn't affect performance.
rand() probably isn't random enough for Monte Carlo simulations; it's intended to be fast, but not high quality. Unfortunately, good pseudorandom generators are quite slow. The C++11 library has several options.

If you do make it multi-threaded, then make sure each thread has a different random seed; otherwise, they'd just duplicate each other's work. You won't be able to use rand() since it isn't thread-safe.

edited Nov 11 '13 at 18:16

answered Nov 11 '13 at 18:01

Mike Seymour

249,747
28
448
644

Using rand is a prerequisite for this, and pi is quite good (3.145 etc.) – arc_lupus Nov 11 '13 at 18:03
Yes, but I think Mike's point is that rand() isn't a high-quality PRNG. – OnoSendai Nov 11 '13 at 18:04
@PeteBecker: I know that, but we only have to calculate pi up to the second number after the komma... – arc_lupus Nov 11 '13 at 18:05
@arc_lupus: If it's good enough for this assignment, then fine. But it's worth bearing in mind that it's not very good when you need statistically good randomness. – Mike Seymour Nov 11 '13 at 18:05
Good point about `rand()`. It is sufficiently random for this specific task, but for real-life problems I wouldn't trust it at all. Aside from using `rand`, a possibly more serious problem is using the % operator to limit the range: some values will occur (slightly) more often than others. – Jeffrey Sax Nov 11 '13 at 18:06
@MikeSeymour: To get better values, I have to switch to another random seed generator? – arc_lupus Nov 11 '13 at 18:07
@arc_lupus that would be the case - a so-called high-entropy one. for example, https://www.grc.com/otg/uheprng.htm – OnoSendai Nov 11 '13 at 18:08
@arc_lupus: Yes. If you're using C++11, then the `` library has a few options. The Mersenne Twister (`std::mt19937`) is usually a good choice. – Mike Seymour Nov 11 '13 at 18:10
@MikeSeymour: Do you have a speed comparision between the Twister and rand()? – arc_lupus Nov 11 '13 at 18:10
@arc_lupus: No, but it should be quite simple to measure them. – Mike Seymour Nov 11 '13 at 18:11
@MikeSeymour: Would it be ok to use time() (as above) as a starting seed, and for multithreading time()+"Thread-Number"? – arc_lupus Nov 11 '13 at 18:39
@arc_lupus: Should be fine, if you don't run the program more than once a second; but make sure you only call `time` once, otherwise two threads might end up with the same seed if they get different time values. It might be better to use `std::random_device` (if available) to get a properly random seed. – Mike Seymour Nov 11 '13 at 18:44
@MikeSeymour: If I use `std::random_device` do I have to worry about thread safety? – arc_lupus Nov 11 '13 at 18:47

Speeding up Monte Carlo Pi

3 Answers3