3

I have a sample C program for addition. When I compile and run it with GCC it is using only one CPU core.

Is there any way to compile a C program so that it can use all CPU cores in Linux.

I used to compile like gcc -O3 malloc.c

Code:

#include <stdio.h>
#include <time.h>
#include <malloc.h>

int main() {
        float *ptr;
        unsigned long long i;

        ptr = (float*) malloc(8000000000 * sizeof(float));

        for(i=0; i<8000000000; i++) {
                ptr[i] = i/10000;
        }

        clock_t tic = clock();

        for(i=0; i<8000000000; i++) {
                ptr[i] = (i/10000)+1.0;
        }

        clock_t toc = clock();

        printf("Elapsed: %f seconds\n", (double)(toc - tic) / CLOCKS_PER_SEC);

        return 0;
}
Ram Idavalapati
  • 666
  • 1
  • 10
  • 22

4 Answers4

15

Is there any way to compile a C program so that it can use all CPU cores in Linux.

No, not as magically as you want it to happen. Parallelization of programs is a very difficult subject and in general cannot be done automagically. BTW, parallel programs might not be as efficient as you wish them to be (be aware of Amdahl's law).

However, you could design and code a parallel program. You might for example use posix threads. Beware, it is tricky! Read first some Pthread tutorial. You won't be sure that all cores would be used (since they are managed by the kernel), but that is in practice very likely. Read also about processor affinity.

You could also use OpenMP or OpenACC. You could code some of your numerical kernels using OpenCL. You could have a multi-processing approach (e.g. forking several processes, using inter-process communications), perhaps using MPI. Look also into the MapReduce approach, the 0mq library (and many others).

You could read something on OSes, e.g. Operating Systems: Three Easy Pieces. You could also read something on Linux system programming, e.g. Advanced Linux Programming (or some newer book). See also intro(2) and syscalls(2) & pthreads(7).

Be aware that designing, coding and debugging a parallel (or concurrent, or distributed) application is very difficult. Take into account the cost of development time (and the time, probably years, needed to acquire the relevant skills). There is No Silver Bullet!

(it is not very realistic to transform an existing real-life sequential application into a parallel one; you usually have to design a parallel program from scratch)

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
3

Try adding the following pragma right above your for loops:

#pragma omp parallel for
for(i=0; i<8000000000; i++) {
    ptr[i] = i/10000;
}

And add the -fopenmp option to your build options when you call gcc. By default, OpenMP will create as many threads as cores in your machine and will share workload evenly between them.

You can check this article for more information on OpenMP.

Genís
  • 1,468
  • 2
  • 13
  • 24
  • Good! This changed the time of the second `for` loop from 8.48 seconds to 1.48 seconds. That is 5.7 times faster, which is what one might expect on my i7 cpu with 6 cores. – Thomas Padron-McCarthy Nov 28 '17 at 12:02
1

You need to create several threads. Otherwise there is only one thread, and it runs on a single core (at a time).

Look at a tutorial about threads, specifically pthreads, to find out how to work with threads. Or you could use the fork system call to split your program into several processes, with one thread each.

Thomas Padron-McCarthy
  • 27,232
  • 8
  • 51
  • 75
0

You can create n ( n is number of cores assume) threads in your program and then you can set CPU affinity of each thread so that it is tied to particular CPU Core. sched_setaffinity or pthread_setaffinity_np is the one allows you to set CPU affinity.

iam.Carrot
  • 4,976
  • 2
  • 24
  • 71