4

I am getting acquainted with pthreads programming; the following code is a simple producer/consumer design where data is put/fetched from a global list. The problem is: the data pointer in the consumer function is tried to be freed twice. Beside, if I add a printf() at the beginning of the loop, everything seems ok... What am I doing wrong? I suspect a misuse of the volatile keyword, or something hidden by the cache... Unless it's just a design issue (which probably is :p).

Thanks for your insights.

Note: malloc()/free() is thread-safe on my system. I am compiling with $ gcc -pthread -O0 which should, I guess, reduce possible design errors due to misuse of volatile. Finally, we don't care in this code snippet with running out of memory (in case of more producer than consumer).

Edit: Changed code to a single list head.

#include <pthread.h>
#include <stdlib.h>
#include <string.h>

pthread_mutex_t lock;
pthread_cond_t new_data;

struct data {
  int i;
  struct data *next;
};

struct data *list_head = NULL;

void *consume(void *arg)
{
  struct data *data;

  while (1) {
    pthread_mutex_lock(&lock);
    while (list_head == NULL) {
      pthread_cond_wait(&new_data, &lock);
    }
    data = list_head;
    list_head = list_head->next;
    pthread_mutex_unlock(&lock);
    free(data);
  }

  return NULL;
}

void *produce(void *arg)
{
  struct data *data;

  while (1) {
    data = malloc(sizeof(struct data));
    pthread_mutex_lock(&lock);
    data->next = list_head;
    list_head = data;
    pthread_mutex_unlock(&lock);
    pthread_cond_signal(&new_data);
  }

  return NULL;
}

int main()
{
  pthread_t tid[2];
  int i;

  pthread_mutex_init(&lock, NULL);
  pthread_cond_init(&new_data, NULL);
  pthread_create(&tid[0], NULL, consume, NULL);
  pthread_create(&tid[1], NULL, produce, NULL);
  for (i = 0; i < 2; i++) {
    pthread_join(tid[i], NULL);
  }
}

And the output:

$ ./a.out 
*** glibc detected *** ./a.out: double free or corruption (fasttop): 0x00007f5870109000 ***
Benny
  • 4,095
  • 1
  • 26
  • 27
  • 1
    You should remove all use of `volatile` in this code. The lock generates the correct memory barriers and such for your platform -- `volatile` is for communication with memory mapped devices and such, not for synchronization. `volatile` prevents reordering or read/write removal by the compiler, but not by the CPU. – Billy ONeal Jan 28 '13 at 20:25
  • 1
    As an aside, I'd also swap the `pthread_cond_signal` and `pthread_mutex_unlock`. – user7116 Jan 28 '13 at 20:55
  • @sixlettervariables I have seen both permutations on different code samples, I can't figure out what's the best practice... I guess that's because the lock is still hold and even though the waiting thread is released from waiting, it still won't be able to get the lock since it has not been released yet? – Benny Jan 28 '13 at 21:07
  • I believe you should unlock then signal. Regardless, your example is *LIFO* but I'll bet you want *FIFO* semantics, am I correct? – user7116 Jan 28 '13 at 21:12
  • @BillyONeal The `volatile` was a qualifier to the pointer type, not to the storage. In `consume()`, `data = list_head;`, are we sure that the compiler will issue code that fetch `list_head` from memory, since `list_head` pointer will be modified outside this this loop? I mean instead of keeping it into a register? – Benny Jan 28 '13 at 21:15
  • @sixlettervariables I don't care at this point, I have changed the code to LIFO for the sake of simplicity. – Benny Jan 28 '13 at 21:20
  • @Benoit: No, the lock itself is supposed to force the compiler to emit a load from memory for that. – Billy ONeal Jan 28 '13 at 21:33
  • @BillyONeal Thanks for the quick answer, do you have any references about that? – Benny Jan 28 '13 at 21:38
  • 2
    @Benoit: http://stackoverflow.com/questions/3208060/does-guarding-a-variable-with-a-pthread-mutex-guarantee-its-also-not-cached <-- Basically, PThreads emits the memory barrier the CPU needs. A compiler can't cache anything stored in a global variable across call to another function, because that other function may modify that global and the compiler does not know that. (if a compiler implements inlining here, then caching the global would have observable side effects and is thus forbidden) – Billy ONeal Jan 28 '13 at 23:20
  • 2
    @Benoit: Signalling the condition variable either before or after the unlock makes no difference in terms of correctness. Signalling it with the lock held is often considered "better" because it ensures that if a higher-priority thread is waiting on the condition variable then a lower-priority thread won't be able to jump in ahead of it (in the window between the unlock and the signal). If you're not using thread priorities, then this won't be a concern. – caf Jan 30 '13 at 12:30

4 Answers4

4

Consider the following scenario:

  1. produce -> acquired lock
  2. consume -> wait lock
  3. produce -> allocate d0, write_ptr = d0, read_ptr = d0, signal, unlock
  4. consume -> acquired lock
  5. produce -> wait lock
  6. consume -> satisfied condition
  7. consume -> data = d0, read_ptr = NULL, unlock
  8. consume -> free d0
  9. produce -> acquired lock, allocate d1
  10. consume -> wait lock
  11. produce -> write_ptr != null so write_ptr->next = d1
  12. produce -> read_ptr == null so read_ptr = d1

Check out step 11. write_ptr is still d0 even though it was free'd independently of the producer thread. You need to make sure consume does not invalidate write_ptr.

A doubly linked list would allow you to avoid some of these difficulties (as the readers and writers work from separate ends).

Main:

  • Create sentinel nodes HEAD and TAIL, link HEAD and TAIL
  • Spawn producer
  • Spawn consumer

Producer:

  • Lock
  • Create node
  • Link HEAD->next->prev to node
  • Link node->next to HEAD->next
  • Link HEAD->next to node
  • Link node->prev to HEAD
  • Unlock
  • Signal

Consumer:

  • Lock
  • Wait for TAIL->prev != HEAD (do { pthread_cond_wait } while (condition);)
  • data = TAIL->prev
  • Link TAIL->prev to data->prev
  • Link TAIL->prev->next to TAIL
  • Unlock
  • Use data
user7116
  • 63,008
  • 17
  • 141
  • 172
  • You are right. That's the problem. He's never setting write_ptr back to NULL again. But wouldn't this behavior be hidden by the bug doron and me pointed out? – junix Jan 28 '13 at 20:49
  • Either way he'll run into using memory that is gone. Which at some point corrupts. – user7116 Jan 28 '13 at 20:50
  • Well that's true. Funny though this results in such "defined" undefined behavior :-) – junix Jan 28 '13 at 20:51
4

I believe that your problem is in the following lines:

if (NULL != write_ptr)
  write_ptr->next = data;
write_ptr = data;
if (NULL == read_ptr)
  read_ptr = data;

I don't think you are building your list correctly. In fact, I don't understand why you have two lists. But anyway...

I assume that you want to add your new data to the head of the list. Otherwise you would need a tail pointer or you would need to chase to the end of the list each time.

To do this you need to add the current list head as the next item of your data. It should look like:

data->next = write_ptr;
write_ptr = data;

There is no need for a NULL check on write_ptr.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
  • That wasn't two lists, but two pointers running on the same list. Anyway you're right, my LL implementation was flawed. I've changed it, although it's now LIFO instead of FIFO. – Benny Jan 28 '13 at 21:34
  • @Benoit: Oh, I see now. The write pointer is supposed to be the end of the list while the read pointer is the head of the list. In which case my suggestion is wrong. – Zan Lynx Jan 28 '13 at 23:05
1

Update: As Billy ONeal pointed out, pthread functions provide the needed memory barriers, so it's not necessary to declare anything volatile as long as it's protected by pthread locks. (See this question for details: Does guarding a variable with a pthread mutex guarantee it's also not cached?)

But I got another explaination for some odd behavior: Your linked list the producer creates is broken: Assume write_ptr is not NULL and observe the behavior:

/* 1 */ if (NULL != write_ptr)
/* 2 */   write_ptr->next = data;
/* 3 */ write_ptr = data;

Say write_ptr points to previously allocated instance A. A.next is NULL. We newly allocate an instance B and set everything to NULL (therefore: B.next is NULL).

Line 1 evaluates true, therefore line 2 is executed. A.next is now pointing to B. After executing line 3, write_ptr points to B B.next is NULL => A is lost what results into a memory leak.

But for now I can't see why the libc complains about double free.

Community
  • 1
  • 1
junix
  • 3,161
  • 13
  • 27
1

The error is in the lines

if (NULL != write_ptr)
   write_ptr->next = data;
write_ptr = data;

This should read: if (NULL != write_ptr) data->next = write_ptr; write_ptr = data;

For debugging purposes also make sure that you are getting your expected values out the queue.

Also there is no need for the volatile variables. since your queue is protected by mutexes, the operating system will insure that queue access is atomic. volatile is only needed when accessing memory mapped hardware resources and should never be used for synchronization. All it does is forces data to memory unnecessarily.

There is also one other issue. If your producer is faster than your consumer, you will eventually run out of memory unless you limit the size of the queue and force the producer to wait for the consumer. Smaller queues are more responsive and the only reason to go for larger queues is to smooth out perturbations in the producer.

doron
  • 27,972
  • 12
  • 65
  • 103
  • That change the FIFO design to LIFO, but anyway that's fixing my issue. I changed my LL implementation. Thanks. – Benny Jan 28 '13 at 21:30