core dumped when using malloc

Question

I have 2 threads in a process. One mallocs and writes packets to a global linked list. The other keeps reading packets from the global linked list, sends them out through a hardware call and then frees the memory. This piece of code handles a really large number of packets at a high rate.

Everything works fine except for this one isolated case, where the process aborted due to what seems to be a failed malloc. It is strange because the man page for malloc says that if malloc fails, it just returns NULL. Could there be any other possible failure with a malloc(), that could cause the process to crash as in my case ?

Here is the backtrace from gdb -

#0  0xffffe430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xf757cc10 in raise () from /lib/libc.so.6 No symbol table info available.
#2  0xf757e545 in abort () from /lib/libc.so.6 No symbol table info available.
#3  0xf75b94e5 in __libc_message () from /lib/libc.so.6 No symbol table info available.
#4  0xf75bf3d4 in malloc_printerr () from /lib/libc.so.6 No symbol table info available.
#5  0xf75c1f5a in _int_malloc () from /lib/libc.so.6 No symbol table info available.
#6  0xf75c3dd4 in malloc () from /lib/libc.so.6 No symbol table info available.
#7  0x080a2466 in np_enqueue_packet_to_tx_queue (prio=2, pkt_type=1 '\001', tx_host_handle=162533812, packet_length=40, 
pTxData=0x14dfa694 "", dlci=474, vfport=71369178) at ./np_tx.c:173 No locals.

Here is the code of the sender thread, whose malloc fails. The sender thread mallocs memory (operation protected by mutex) and writes onto the global queue (also protected by mutex). When the core dump happened, from gdb I can see that the first malloc was successful, and the second one failed and caused the core dump.

void np_enqueue_packet_to_tx_queue(int prio, WP_U8 pkt_type,
                               WP_handle tx_host_handle,
                               WP_S32 packet_length, WP_CHAR *pTxData,
                               WP_U32 dlci, WP_U32 vfport)
{
    STRU_TX_QUEUE_NODE *packetToSend;
    packetToSend = malloc(sizeof(STRU_TX_QUEUE_NODE));
    if (packetToSend == NULL)
    {
        WDDI_ERR(" Cannot allocate new memory in np_enqueue_packet_to_tx_queue\n");
        return;
    }
    memset(packetToSend, 0, sizeof(STRU_TX_QUEUE_NODE));
    packetToSend->packet = (WP_CHAR*)malloc(packet_length);
    if (packetToSend->packet == NULL)
    {
        WDDI_ERR(" Cannot allocate new memory in np_enqueue_packet_to_tx_queue\n");
        free(packetToSend);
        packetToSend = NULL;
        return;
    }
    memset(packetToSend->packet, 0, packet_length);
    packetToSend->pkt_type = pkt_type;
    packetToSend->packet_length = packet_length;
    memcpy(packetToSend->packet, pTxData, packet_length);
    if (pkt_type == PACKET_TYPE_FR)
    {
        packetToSend->fr_tx_info.tx_host_handle = tx_host_handle;
        packetToSend->fr_tx_info.dlci = dlci;
        packetToSend->fr_tx_info.vfport = vfport;
    }
    pthread_mutex_lock(&tx_queue_mutex);
    if (prio == PRIO_HIGH)
    {
        write_packet_to_tx_queue(&high_prio_tx_queue_g, packetToSend);
    }
    else
    {
        write_packet_to_tx_queue(&low_prio_tx_queue_g, packetToSend);
    }
    pthread_mutex_unlock(&tx_queue_mutex);
    // wakeup Tx thread
    pthread_cond_signal(&tx_queue_cond);
}

Can someone help pointing out what may have happened wrong here ?

And here is the code for the reader thread. It reads some data from the global queue (operation protected by mutex), releases the mutex, does some processing with the data, and then frees the memory of the data (this operation not protected by mutex).

void *tx_thread(void *arg)
{
    STRU_TX_QUEUE_NODE *pickedUpPackets[TX_NUM_PACKETS_BUFFERED];
    int read_counter, send_counter;

    while (1)
    {
        pthread_mutex_lock(&tx_queue_mutex);
        while ((high_prio_tx_queue_g.len == 0) && (low_prio_tx_queue_g.len == 0))
        {
            pthread_cond_wait(&tx_queue_cond, &tx_queue_mutex);
        }
        if (high_prio_tx_queue_g.len)
        {
            for (read_counter = 0; read_counter < TX_NUM_PACKETS_BUFFERED; read_counter++)
            {
                pickedUpPackets[read_counter] = read_packet_from_tx_queue(&high_prio_tx_queue_g);
                if (pickedUpPackets[read_counter] == NULL)
                {
                    break;
                }
            }
        }
        else if (low_prio_tx_queue_g.len)
        {
            for (read_counter = 0; read_counter < TX_NUM_PACKETS_BUFFERED; read_counter++)
            {
                pickedUpPackets[read_counter] = read_packet_from_tx_queue(&low_prio_tx_queue_g);
                if (pickedUpPackets[read_counter] == NULL)
                {
                    break;
                }
            }
        }
        pthread_mutex_unlock(&tx_queue_mutex);
        for (send_counter = 0; send_counter < read_counter; send_counter++)
        {
           np_host_send(pickedUpPackets[send_counter]);
        }
    }
}

void np_host_send(STRU_TX_QUEUE_NODE *packetToSend)
{
    if (packetToSend == NULL)
    {
        return;
    }

    // some hardware calls 

    free(packetToSend->packet);
    packetToSend->packet = NULL;
    free(packetToSend);
    packetToSend = NULL;
}

Note that a crash in malloc or free is usually indicative of a bug which has previously corrupted the heap. — Paul R, Apr 16 '14 at 14:03
Why do people use malloc() + memset() instead of just calloc()? — John Zwinck, Apr 16 '14 at 14:06
I'm actually not sure if there are any guarantees that malloc() is implemented as a thread-safe function...? — Lundin, Apr 16 '14 at 14:07
@Lundin: I believe that all the C standard library is thread-safe these days, apart from specific well-documented exceptions. Of course that may just be for gcc etc, and not universally true. — Paul R, Apr 16 '14 at 14:11
@PaulR Normally I don't think it would be thread safe, but the internet seems to suggest that if you compile with -pthreads on gcc, then malloc will turn thread-safe, as some gcc-specific feature. So, likely that's not an issue here. — Lundin, Apr 16 '14 at 14:15
As @PaulR said, it's most probably caused by heap corruption, possibly by memset-ting or strcpy-ing more data to the allocated block than the block length. — CiaPan, Apr 16 '14 at 14:22
Valgrind this. And assuming this is using pthreads, [`malloc` is ok](http://stackoverflow.com/questions/855763/is-malloc-thread-safe) (though your cast isn't). Paul is correct about you likely toasting the free-chain somewhere else in your code, and valgrind will usually find where in short order. And unrelated, the `msmset` of `packetToSend->packet` is pointless, since you're about to `memcpy` over every one of those allocated bytes later on. — WhozCraig, Apr 16 '14 at 14:28
Where do you free() the data? Is the free() call protected by the same mutex? Otherwise, you'd get a heap corruption when free() is executing and you get a context switch to a thread which starts doing things with the global list. — Lundin, Apr 16 '14 at 14:32
@Lundin: Good point! I am freeing the data in the other (reader) thread and it is not protected by the same mutex. So, the scenario is -> writer thread mallocs protected by my mutex, and reader thread reads protected by mutex, unlocks the mutex and then frees the memory. Given that malloc and free are thread safe, would my scenario be a problem ? In other words, given that malloc and free are thread safe, does that mean that it is safe to malloc in one thread and free the memory in a another thread without user level mutex protection ? — JuhiS, Apr 17 '14 at 03:08
@WhozCraig: What did you mean when you said the cast isn't ok? — JuhiS, Apr 17 '14 at 03:30
@user3541253 See this regarding `malloc` : ["Do I cast the result of `malloc`?"](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc) — WhozCraig, Apr 17 '14 at 05:25
which line is ``np_tx.c:173`` mentioned in the error message? — Vorsprung, Apr 17 '14 at 08:20

core dumped when using malloc

0 Answers0