Mandelbrot optimization in openmp

Question

Well i have to paralellisize the mandelbrot program in C. I think i have done it well and i cant get better times. My question if someone has an idea to improve the code, ive been thinking perhaps in nested parallel regions between the outer and insider for...

Also i have doubts if its more elegant or recommended to put all the pragmas in a single line or to write separate pragmas ( one for omp parallel and shared and private variables and a conditional, and another pragma with omp for and schedule dynamic).

Ive the doubt if constants can be used as private variables because i think is cleaner to have constants instead of defined variables.

Also i have written a conditional ( if numcpu >1) it has no sense to use parallel region and make a normal sequential execution.

Finally as i have read the dynamic chunk it depends on hardware and your system configuration... so i have left it as a constant, so it can be easily changed.

Also i adapt the number of threads to the number of processors available..

 int main(int argc, char *argv[])
{
    omp_set_dynamic(1);

    int xactual, yactual;

    //each iteration, it calculates: newz = oldz*oldz + p, where p is the current pixel, and oldz stars at the origin
    double pr, pi;                   //real and imaginary part of the pixel p
    double newRe, newIm, oldRe, oldIm;   //real and imaginary parts of new and old z
    double zoom = 1, moveX = -0.5, moveY = 0; //you can change these to zoom and change position

    pixel_t *pixels = malloc(sizeof(pixel_t)*IMAGEHEIGHT*IMAGEWIDTH);
    clock_t begin, end;
    double time_spent;    

    begin=clock();

    int numcpu;
    numcpu = omp_get_num_procs();

    //FILE * fp;
    printf("El número de procesadores que utilizaremos es: %d", numcpu);

    omp_set_num_threads(numcpu);

    #pragma omp parallel shared(pixels, moveX, moveY, zoom) private(xactual, yactual, pr, pi, newRe, newIm) (if numcpu>1)
    {
        //int xactual=0;
    //  int yactual=0;
        #pragma omp for  schedule(dynamic, CHUNK)       

    //loop through every pixel
        for(yactual = 0; yactual < IMAGEHEIGHT; yactual++)
            for(xactual = 0; xactual < IMAGEWIDTH; xactual++)
            {
                //calculate the initial real and imaginary part of z, based on the pixel location and zoom and position values
            pr = 1.5 * (xactual - IMAGEWIDTH / 2) / (0.5 * zoom * IMAGEWIDTH) + moveX;
            pi = (yactual - IMAGEHEIGHT / 2) / (0.5 * zoom * IMAGEHEIGHT) + moveY;
            newRe = newIm = oldRe = oldIm = 0; //these should start at 0,0
            //"i" will represent the number of iterations
            int i;
            //start the iteration process
            for(i = 0; i < ITERATIONS; i++)
            {
                //remember value of previous iteration
                oldRe = newRe;
                oldIm = newIm;
                //the actual iteration, the real and imaginary part are calculated
                newRe = oldRe * oldRe - oldIm * oldIm + pr;
                newIm = 2 * oldRe * oldIm + pi;
                //if the point is outside the circle with radius 2: stop
                if((newRe * newRe + newIm * newIm) > 4) break;
            }

            //            color(i % 256, 255, 255 * (i < maxIterations));
            if(i == ITERATIONS)
            {
                //color(0, 0, 0); // black
                pixels[yactual*IMAGEWIDTH+xactual][0] = 0;
                pixels[yactual*IMAGEWIDTH+xactual][1] = 0;
                pixels[yactual*IMAGEWIDTH+xactual][2] = 0;
            }
            else
            {
                double z = sqrt(newRe * newRe + newIm * newIm);
                int brightness = 256 * log2(1.75 + i - log2(log2(z))) / log2((double)ITERATIONS);

                //color(brightness, brightness, 255)
                pixels[yactual*IMAGEWIDTH+xactual][0] = brightness;
                pixels[yactual*IMAGEWIDTH+xactual][1] = brightness;
                pixels[yactual*IMAGEWIDTH+xactual][2] = 255;
            }      

      }

    }   //end of parallel region

    end= clock();

    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    fprintf(stderr, "Elapsed time: %.2lf seconds.\n", time_spent);

Yes, the way to improve the speed of your code is to use SIMD. I do this with SSE and AVX. Is this for a x86 processor? If you add the SIMD and or SSE or AVX tag you will probably get a better answer. — Z boson, Apr 24 '15 at 07:26
You may also be interested in the code here : https://stackoverflow.com/questions/48069990/multithreaded-simd-vectorized-mandelbrot-in-r-using-rcpp-openmp — Tom Wenseleers, Jan 24 '18 at 16:19

score 0 · Answer 1 · answered Apr 23 '15 at 10:49

0

You could extend the implementation to leverage SIMD extensions. As far as I know the latest OpenMP standard includes vector constructs. Check out this article that describes the new capabilities.

This whitepaper explains how SSE3 can be used when calculating the Mandelbrot set.

answered Apr 23 '15 at 10:49

hayesti

2,993
2
23
34

SIMD is a good suggestion. That's one of the tools I employ to speed up the Mandelbrot calculation. – Z boson Apr 24 '15 at 07:24

Mandelbrot optimization in openmp

1 Answers1