3

I created a program that does the addition of 8 numbers using 4 threads, and then the product of the results. How to ensure that each thread is using a separate core for maximum performance gains. I am new to pthreads so I really don't have any idea on how to use it properly. Please provide answers as simple as possible.

My code:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
int global[9];
void *sum_thread(void *arg)
{
    int *args_array;
    args_array = arg;
    int n1,n2,sum;
    n1=args_array[0];
    n2=args_array[1];
    sum = n1*n2;

    printf("N1 * N2 = %d\n",sum);
    return (void*) sum;
}
void *sum_thread1(void *arg)
{
    int *args_array;
    args_array = arg;
    int n3,n4,sum2;
    n3=args_array[2];
    n4=args_array[3];
    sum2=n3*n4;
    printf("N3 * N4 = %d\n",sum2);
    return (void*) sum2;
}
void *sum_thread2(void *arg)
{
    int *args_array;
    args_array = arg;
    int n5,n6,sum3;
    n5=args_array[4];
    n6=args_array[5];
    sum3=n5*n6;
    printf("N5 * N6 = %d\n",sum3);
    return (void*) sum3;
}
void *sum_thread3(void *arg)
{
    int *args_array;
    args_array = arg;
    int n8,n7,sum4;
    n7=args_array[6];
    n8=args_array[7];
    sum4=n7*n8;
    printf("N7 * N8 = %d\n",sum4);
    return (void*) sum4;
}
int main()
{
    int sum3,sum2,sum,sum4;
    int prod;
    global[0]=9220; global[1]=1110; global[2]=1120; global[3]=2320; global[4]=5100; global[5]=6720; global[6]=7800; global[7]=9290;// the input
    pthread_t tid_sum;
    pthread_create(&tid_sum,NULL,sum_thread,global);
    pthread_join(tid_sum,(void*)&sum);
    pthread_t tid_sum1;
    pthread_create(&tid_sum1,NULL,sum_thread1,global);
    pthread_join(tid_sum1,(void*)&sum2);
    pthread_t tid_sum2;
    pthread_create(&tid_sum2,NULL,sum_thread2,global);
    pthread_join(tid_sum2,(void*)&sum3);
    pthread_t tid_sum3;
    pthread_create(&tid_sum3,NULL,sum_thread3,global);
    pthread_join(tid_sum3,(void*)&sum4);
    prod=sum+sum2+sum3+sum4;
    printf("The sum of the products is: %d", prod);
    return 0;
}
  • 7
    You don't trust the scheduler in your OS? – Fred Larson Nov 29 '18 at 15:53
  • 5
    you're getting negative performance gain right now by launching a thread and waiting for it to complete before launching the next one. The whole point of threads is to run them in parallel.. instead you've serialized everything with the added overhead of thread creation. – yano Nov 29 '18 at 15:54
  • @yano: Good catch. I didn't even look that far into the code. Maybe you could make an answer out of that? – Fred Larson Nov 29 '18 at 15:56
  • @FredLarson thanks,,, doesn't answer the question though – yano Nov 29 '18 at 16:00
  • @yano how to fix the issue you just mentioned? – Francesco Bernouli Nov 29 '18 at 16:02
  • @yano how to make the threads in parallel? – Francesco Bernouli Nov 29 '18 at 16:04
  • 1
    start all the threads with consecutive `pthread_create` statements, then later wait for them to finish with consecutive `pthread_join` statements. `pthread_join` causes the calling thread (in this case the main thread) to pause until the thread being joined on is done working. So what you have will launch `tid_sum`, then wait for it to finish, then launch `tid_sum1`, and wait for it finish, etc. Generally you want to launch all your worker threads at the same time, let them work in parallel, wait for them to finish at some later point, then consolidate their work. – yano Nov 29 '18 at 16:06
  • IMO this is good thread tutorial that would be worth your time to read: https://computing.llnl.gov/tutorials/pthreads/ – yano Nov 29 '18 at 16:09
  • 2
    in terms of your actual question, I think you're getting ahead of yourself. Your OS scheduler is going to be _very good_ at what it does, which is scheduling what processes to run on what cores. Unless you have an _even better_ reason to supersede it, don't. Have you analyzed the performance of your code and it's simply not fast enough? Premature optimization can lead you down a rabbit hole. In this case, your code is all serialized, so if it's too slow start by parallelizing it. if you're asking simply for curiosity, google "processor affinity", pthreads supports that. – yano Nov 29 '18 at 16:15
  • See: https://unix.stackexchange.com/questions/295447/how-do-i-specify-which-core-a-pthread-is-spawned-on https://stackoverflow.com/questions/1407786/how-to-set-cpu-affinity-of-a-particular-pthread – bhathiya-perera Nov 29 '18 at 16:23
  • 1
    All your threads spend the vast majority of their time printing, something that they can't do concurrently anyway. – David Schwartz Nov 29 '18 at 17:27
  • @FrancescoBernouli please upvote the answer if it was useful – roschach Dec 10 '18 at 14:03
  • Possible duplicate of [how to set CPU affinity of a particular pthread?](https://stackoverflow.com/questions/1407786/how-to-set-cpu-affinity-of-a-particular-pthread) – rustyx Jan 16 '19 at 09:57

1 Answers1

0

You don't have, don't want and mustn't (I don't know if you somehow you can though) manage hardware resources at such low levels. That's a job for your OS and partially for standard libraries: they have been tested optimized and standardized properly.

I doubt you can do better. If you do what you are saying either you are an expert hardware/OS programmer or you are destroying decades of works :) .

Also consider this fact: your code will not be portable anymore if you could index the cores manually since it depends on the number of cores of your machine.

On the other way multithread programs should work (and even better sometimes) even when having one core. An example is the case where one of the threads doesn't do anything until an event happens: you can make one thread go to "sleep" so that only the other threads use the CPU; then when the event happens it will execute. In a non-multithread program generally polling is used which uses CPU resource to do nothing.

Also @yano said you are multithread program is not really parallel in this case since you are creating the thread and then waiting for it to finish with pthread_join before starting the other threads.

roschach
  • 8,390
  • 14
  • 74
  • 124