1

I have a program with only 2 threads. One is the main thread, the second one is used as a "music processor". The music processor is initially sleeping on a condition variable by calling pthread_cond_wait. The main thread puts the data to be processed by the other thread in a shared variable and wakes up the thread using pthread_cond_signal.

I've build this program on an Ubuntu 16 system and it ran perfectly. I then proceeded to build a GNU system with the latest linux kernel and GLibc 2.17 on which I need this program to run using the LFS 8.2 instructions.

Running it on this system the music processing thread always fails with the "futex facility returned an unexpected error code" message at the call to pthread_cond_wait. That can be causing this? I've looked all over and can't find any explanation.

EDIT

Here's the simplified code:

struct _audio_processor {
/*  Other variables
        .
        .
        .
 */
    pthread_mutex_t     frameAdvanceLock,
                queuedEffectsLock;
    pthread_cond_t      frameAdvanceBarrier;
} __attribute__((packed));

typedef struct _audio_processor * AudioProcessor;

static void * _playbackThreadBody ( register void * p )
{
    register AudioProcessor processor = NULL;

    processor = (AudioProcessor) p;

    while ( processor->audioThreadRunning ) {
        pthread_mutex_lock ( & processor->frameAdvanceLock );
/* This is where it INVARIABLY fails. */
        pthread_cond_wait ( & processor->frameAdvanceBarrier, & processor->frameAdvanceLock );
        pthread_mutex_lock ( & processor->frameAdvanceLock );
/*  Rest of the thread (stuff happens here that takes time).
        .
        .
        .
*/
}

    return NULL;
};

AudioProcessor CreateAudioProcessor ( void )
{
    register AudioProcessor result = NULL;
    register int        status = -1;
    pthread_attr_t      attributes;

    result = & _mainAudioProcessor;
/*
    Other variables initialized here.
        .
        .
        .
 */
    pthread_cond_init ( & result->frameAdvanceBarrier, NULL);
    pthread_mutex_init ( & result->frameAdvanceLock, NULL );
    pthread_mutex_init ( & result->queuedEffectsLock, NULL );
    pthread_attr_init ( & attributes );
    pthread_attr_setstacksize ( & attributes, 8192 );
    status = pthread_create ( & _audioProcessorThread, & attributes,     _playbackThreadBody, result );
    sched_yield ();

    return result;
};

void AudioProcessorPlaybackMusic ( register const AudioProcessor processor )
{
    register int    status = -1;

    pthread_cond_signal ( & processor->frameAdvanceBarrier );
};
  • Post some code that demonstrates this, and we can take a look. – Terry Carmen Aug 23 '18 at 17:54
  • It's not well documented, but some people have found that it was caused by passing in a bad pointer to pthread_cond_wait. You might start there and see if you're passing in valid data. – Terry Carmen Aug 23 '18 at 18:05
  • I'll post a simplified version of the code in a bit (it's rather large, line-wise). I thought that could be the problem so I checked both the condition variable itself and the mutex. Both are properly initialized and the mutex is locked. – Jamie Ramone Aug 23 '18 at 20:47

1 Answers1

0

This can happen if the "pthread_cond_t" memory is not aligned to a DWORD boundary, e.g. when allocated on the heap? Code to demonstrate this is provided below... Note: It appears that the "pthread_mutex_t" also has to be on the heap for the issue to be observed?

It seems that "malloc()" guarantees alignment: Why does Malloc() care about boundary alignments? (I was getting the error sometimes when using my own debug heap - which doesn't provide this guarantee.)

Not sure that there is any guarantee of alignment for locals/globals, although maybe that can be specified as part of the struct definition? Structure alignment in GCC (should alignment be specified in typedef?) Certainly examples (https://linux.die.net/man/3/pthread_cond_init) have these as part of larger structs, although maybe again that guarantees alignment (modulo struct packing over-rides).

--> Maybe these structs are meant to be allocated dynamically? If so the docs should specify that, or at least the alignment requirement?

--> Maybe the "pthread_cond_init()" (or the subsequent "wait") function(s) should raise an error early on if the required alignment isn't satisfied, rather than just dumping a cryptic message to the console and killing the process? (Having said that, the error only happens when both structs are on the heap.)

Code to demonstrate the error / success, with different alignments:

        // This has to be aligned on a DWORD boundary, tested on Kubuntu 20.04:
        int const alignmentRequirement = sizeof(int);
        typedef long TPointerSize;  // (Assuming 64bit addresses.)

        // Use malloc with its logic, however can apply an n-byte offset to the struct used below:
        #define ALLOC_WITH_OFFSET(OFFSET)        (pthread_cond_t*)(OFFSET + (char*)malloc(OFFSET + sizeof(pthread_cond_t)));

        // Uncomment only one of these lines to observe the behaviour:
        pthread_cond_t* condition_var = (pthread_cond_t*)malloc(sizeof(pthread_cond_t));               //--    (A) Works:  (I.e. waits forever.)  malloc() returns 'aligned' values.
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(0);                                        //--    (B) Works:  Same as malloc, above.
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(1);                                        //--    (C) FAILS:  Cryptic error message, process killed.
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(2);                                        //--    (D) FAILS:  As above.
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(3);                                        //--    (E) FAILS:  As above.
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(4);                                        //--    (F) Works!! (I.e. waits forever.)  A whole number of "DWORD"s past the normal malloc() alignment:
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(5);                                        //--    (G) FAILS:  As above.
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(6);                                        //--    (H) FAILS:  As above.
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(7);                                        //--    (I) FAILS:  As above.
    //    pthread_cond_t* condition_var = ALLOC_WITH_OFFSET(8);                                        //--    (J) Works!! (I.e. waits forever.)  A whole number of "DWORD"s past the normal malloc() alignment:
        printf("Created 'pthread_cond_t', address = 0x%lx,  mis-alignment = %ld bytes\n", (TPointerSize)condition_var, ((TPointerSize)condition_var) % alignmentRequirement);  // (Assuming 64bit addresses.)

        // Use that to wait on a mutex...
    //    pthread_mutex_t mutex;                                                                //-- Note that with the mutex as a local, none of the above cause the issue...
        pthread_mutex_t* mutex = (pthread_mutex_t*)malloc(sizeof(pthread_mutex_t));
        pthread_mutexattr_t    attr;
        memset(&attr, (int)NULL, sizeof(pthread_mutexattr_t));
        /*int res =*/ pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);                //-- (Check return value is zero.)

        printf("Waiting forever, this will kill the process when the error happens (or will wait forever if is ok - then you will have to kill the process):\n");
        /*int res =*/ pthread_mutex_lock(/*&*/mutex);                                           //-- Grab the mutex first, always works.    (Check return value is zero.)
        // Error message is   "The futex facility returned an unexpected error code.":
        /*int res =*/ pthread_cond_wait(condition_var, /*&*/mutex);                             //-- (Check return value is zero.)

        // Not showing cleanup/free of memory...

PhilD
  • 1
  • 1
  • This solution worked for me. Turns out, I was setting my pthread_cond_t NOT an a DWORD offset, so instead of every 4 bytes, I set it on the 3rd or something, and pthread_cond_wait did not like that. In order to solve this, I took the amount of memory I was supposed to allocate (which could have been an odd number), and moduloed it by 8 (to be safe), and removed the non-divisible by 8 amount of bytes. and it all worked out for me. BUFFER_SIZE = 6523; // This fixed it BUFFER_SIZE -= BUFFER_SIZE % 8; – laundry Mar 26 '23 at 01:37