Compile a C program with GCC, so that it can use all cpu cores in linux

Question

I have a sample C program for addition. When I compile and run it with GCC it is using only one CPU core.

Is there any way to compile a C program so that it can use all CPU cores in Linux.

I used to compile like gcc -O3 malloc.c

Code:

#include <stdio.h>
#include <time.h>
#include <malloc.h>

int main() {
        float *ptr;
        unsigned long long i;

        ptr = (float*) malloc(8000000000 * sizeof(float));

        for(i=0; i<8000000000; i++) {
                ptr[i] = i/10000;
        }

        clock_t tic = clock();

        for(i=0; i<8000000000; i++) {
                ptr[i] = (i/10000)+1.0;
        }

        clock_t toc = clock();

        printf("Elapsed: %f seconds\n", (double)(toc - tic) / CLOCKS_PER_SEC);

        return 0;
}

If you use the make buildtool then you can use -j flag, where you can job simultaneously. For 4 cores you can give like `make -j4`, is that what your asking? — danglingpointer, Nov 28 '17 at 09:44
@LethalProgrammer: `make -j` is relevant at build time, not at run time — Basile Starynkevitch, Nov 28 '17 at 09:45
@BasileStarynkevitch, I wasn't sure what OP asked for, that's why I just raised the question for clarification. If he wants to do this at runtime then I agree with you no point in using `make`. — danglingpointer, Nov 28 '17 at 09:48
I fit was as simple as flipping a switch, why wouldn't it be default on? — el.pescado - нет войне, Nov 28 '17 at 09:53
BTW, the `-O` switch is just for mentioning the level of optimization that you want. — babon, Nov 28 '17 at 10:31
@babon: but the OP dreams that `-O3` would parallelize his application, and that is not the case. — Basile Starynkevitch, Nov 28 '17 at 10:32
As `-O3` is for optimization, I am looking for any other options something like that for parallelization and I understood that there are none like that. — Ram Idavalapati, Nov 28 '17 at 10:37
Please consider reading what @BasileStarynkevitch has said in the answer. Especially the part about "You won't be sure..." — babon, Nov 28 '17 at 10:39
Note that `clock` does not give you elapsed "wall-clock" time, but used processor time, so (assuming the work could be split over several cores without any overhead) your program would print the same time for a single-threaded and for a multi-threaded version of the program. — Thomas Padron-McCarthy, Nov 28 '17 at 12:05

Basile Starynkevitch · Answer 1 · 2017-11-28T10:30:34.093

Is there any way to compile a C program so that it can use all CPU cores in Linux.

No, not as magically as you want it to happen. Parallelization of programs is a very difficult subject and in general cannot be done automagically. BTW, parallel programs might not be as efficient as you wish them to be (be aware of Amdahl's law).

However, you could design and code a parallel program. You might for example use posix threads. Beware, it is tricky! Read first some Pthread tutorial. You won't be sure that all cores would be used (since they are managed by the kernel), but that is in practice very likely. Read also about processor affinity.

You could also use OpenMP or OpenACC. You could code some of your numerical kernels using OpenCL. You could have a multi-processing approach (e.g. forking several processes, using inter-process communications), perhaps using MPI. Look also into the MapReduce approach, the 0mq library (and many others).

You could read something on OSes, e.g. Operating Systems: Three Easy Pieces. You could also read something on Linux system programming, e.g. Advanced Linux Programming (or some newer book). See also intro(2) and syscalls(2) & pthreads(7).

Be aware that designing, coding and debugging a parallel (or concurrent, or distributed) application is very difficult. Take into account the cost of development time (and the time, probably years, needed to acquire the relevant skills). There is No Silver Bullet!

^{(it is not very realistic to transform an existing real-life sequential application into a parallel one; you usually have to design a parallel program from scratch)}

Does `-ftree-parallelize-loops=4` still work in gcc? If so, it might work here in this simple case with compile-time-constant large trip-counts for the loops. — Peter Cordes, Dec 20 '17 at 13:47
Probably yes, but you should try (and the answer could be compiler version specific) — Basile Starynkevitch, Dec 20 '17 at 13:48

score 3 · Answer 2 · answered Nov 28 '17 at 11:32

3

Try adding the following pragma right above your for loops:

#pragma omp parallel for
for(i=0; i<8000000000; i++) {
    ptr[i] = i/10000;
}

And add the -fopenmp option to your build options when you call gcc. By default, OpenMP will create as many threads as cores in your machine and will share workload evenly between them.

You can check this article for more information on OpenMP.

answered Nov 28 '17 at 11:32

Genís

1,468
2
13
24

Good! This changed the time of the second `for` loop from 8.48 seconds to 1.48 seconds. That is 5.7 times faster, which is what one might expect on my i7 cpu with 6 cores. – Thomas Padron-McCarthy Nov 28 '17 at 12:02

Thomas Padron-McCarthy · Answer 3 · 2017-11-28T09:47:35.010

1

You need to create several threads. Otherwise there is only one thread, and it runs on a single core (at a time).

Look at a tutorial about threads, specifically pthreads, to find out how to work with threads. Or you could use the fork system call to split your program into several processes, with one thread each.

edited Nov 28 '17 at 09:47

answered Nov 28 '17 at 09:31

Thomas Padron-McCarthy

27,232
8
51
75

It could be a multi-process program, not a multi-threaded one – Basile Starynkevitch Nov 28 '17 at 09:40
@BasileStarynkevitch: True. – Thomas Padron-McCarthy Nov 28 '17 at 09:46
Creating several threads or processes is (relatively) easy. What's hard is how to partition the work so that they are actually able to proceed independently, but coordinate when they need to. – tripleee Nov 28 '17 at 09:46
1

I don't think that synchronization and communication between threads or processes is easy. My opinion is that it is *hard*. – Basile Starynkevitch Nov 28 '17 at 09:49
@BasileStarynkevitch: I think everyone agrees with you. What I think triplee means is that calling pthread_create is easy. It is all the rest that is hard. – Thomas Padron-McCarthy Nov 28 '17 at 10:00
But `pthread_create` is not enough, even for the simplistic code given in the question! – Basile Starynkevitch Nov 28 '17 at 10:01
AFAIK, creating multiple threads / process is not a guarantee that they will use multiple cores. Am I wrong? – babon Nov 28 '17 at 10:26
@babon: you don't have any guarantee (and you probably don't want one). In practice and in most cases, several cores would be used. Read also about [processor affinity](https://en.wikipedia.org/wiki/Processor_affinity) – Basile Starynkevitch Nov 28 '17 at 10:28
@BasileStarynkevitch Thanks. But, I guess this is exactly what the OP wants to achieve IMHO... – babon Nov 28 '17 at 10:34

score 0 · Answer 4 · edited Feb 22 '18 at 09:37

0

You can create n ( n is number of cores assume) threads in your program and then you can set CPU affinity of each thread so that it is tied to particular CPU Core. sched_setaffinity or pthread_setaffinity_np is the one allows you to set CPU affinity.

edited Feb 22 '18 at 09:37

iam.Carrot

4,976
2
24
71

answered Feb 22 '18 at 08:29

Amit Tewari

1

Compile a C program with GCC, so that it can use all cpu cores in linux

4 Answers4

Linked