2

I'am novice in programming and just started using pthread in c language.I was curious about the degree of performance improvement by multi-threading. To test this I wrote a simple program to calculates the sum of n digits(honestly, took it from youtube video). I gave it some real big numbers to get some values of execution time.

#include<stdio.h>
#include<pthread.h>
long long sum=0,pod=1;
void* sum_run(void* arg)
{
    long long *var_ptr=(long long *)arg;
    long long i,var=*var_ptr;
    for(i=0;i<=var;i++)
    {
        sum+=i;
    }
    pthread_exit(0);
}

void* sum_run2(void* arg)
{
    long long *var_ptr2=(long long *)arg;
    long long j,var2=*var_ptr2;
    for(j=0;j<=var2;j++)
    {
        pod+=j;
    }
    pthread_exit(0);
}

int main(void)
{
    printf("wait getting it...\n");
    long long val=999999999,val2=899999999;
    pthread_t tid[1];
    pthread_create(&tid[0],NULL,sum_run,&val);
    pthread_create(&tid[1],NULL,sum_run2,&val2);
    pthread_join(tid[0],NULL);
    pthread_join(tid[1],NULL);
    printf("sum1 is %lld sum2 is %lld",sum,pod);
}

O yeah, by mistake I initiated the second long long variable pod to 1 which gave me false result (i.e. 1 more than the desired). So , I corrected my mistake and made pod=0 and here came the PROBLEM after changing it my program's execution time increased to more than twice even larger than the program which does the same task without using pthread. I can't think of what's happening inside. Please help the program.

pod=1 exec.time=~2.8secs

pod=0 exec.time=~11.4secs

when sum=1 pod=1 exec.time bounces to ~25.4secs

Why is it shifting due to changing values?

Also, I found out if one variable is 0 and other's not then their addresses are not continuous.

Using Devcpp's gcc4.9.2 with -pthread switch

Cœur
  • 37,241
  • 25
  • 195
  • 267

1 Answers1

5

You are seeing false sharing caused by sum and pod being initialized the same way in close proximity. This causes them to share a cache line.

As each thread tries to modify the cache line, it will find that the other thread has modified it last and the inter-core protocol has to be invoked to transfer ownership of the modified cache line from the other core to this core. The cache line will ping-pong back and forth and the two threads will run at the speed of the inter-core bus -- much worse than the speed of a single thread repeatedly hitting its L1 cache. This phenomenon is called false sharing.

By initializing one and not the other, you caused them to be allocated in different segments. This made the false sharing go away as they are now too far apart to share a cache line.

A common solution to this problem is to put some padding variables between the two. For example, you could put them in a struct with long long spacing[7]; between them.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Are you sure, that this kind of loop is optimized to nothing by any c-compiler available? I doubt that... And if the threads modify the same cache line, why should there be a performance penalty? Sorry, I do not think this answer is a good guess of what is happening. – Ctx Mar 07 '17 at 10:41
  • @Ctx It doesn't matter whether it's optimized or not. The code won't work correctly either way, as I explained. If you don't understand why two threads trying to modify the same cache line cause a performance penalty, you need to study contention. That's the most common type of contention. You can start reading about the issue [here](https://en.wikipedia.org/wiki/False_sharing) or [here](https://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html). – David Schwartz Mar 07 '17 at 10:43
  • Indeed, this might be the issue... Perhaps you should add that setting pod=1 might place it somewhere else in memory than for pod=0, sharing the same cache line then, this would explain the effect. – Ctx Mar 07 '17 at 10:51
  • 1
    Why would it slow down just by initialising `pod` to 0 instead of 1? Could it be that zero initialised data is in a different segment (the BSS segment) to non zero initialised data (data segment) and therefore with `pod = 1` the variables are unlikely to be on the same cache line? – JeremyP Mar 07 '17 at 10:52
  • 1
    @JeremyP because it moves pod from `bss` section. – LPs Mar 07 '17 at 10:53
  • @DavidSchwartz Yes, I realized that. I cancelled. – LPs Mar 07 '17 at 10:54
  • Thanks man! The problem was sharing, I tried 2 methods:- the struct one and another by creating an array of 10 elements and performing operation on the elements from which i found out that to avoid any performance bugs like this operations must be performed on elements seperated by a min. distance of 7. Also in structure case there is no distance problem even this worked fine--->struct space{ long long sum; int i; long long pod ;} – Rahul Kumar Mar 07 '17 at 11:50