0

EDIT: What I thought was a bug-correction was nothing but a random fix. I was forgetting to free up heap memory correctly. See the actual code and explanation at the bottom (I'm not sure if I should have edited the whole question).

I haved fixed a bug, but do not understand what it's causing it.

If I use a helper pointer to access the element of a list in the heap, and modify its contents, everything's fine. However, if I directly modify the contents of the element using a function that returns that pointer, then this data gets corrupted at a later stage when allocating more memory.

The error only appears when the code is run in a loop, on the 6th cycle.

Code does not work

// numEvents gets corrupted after some cycles
lstEvent_NewList(&lstEvent, 1, &err); // creates a list of typeEvent in heap
//...
lstEvent_FirstNode(&lstEvent)->numEvents = events; // access first element of list directly
//...
typeCommand * p = 0;
p = (typeCommand *) malloc(sizeof(typeCommand)); // do some more allocations
lstField_NewList(&p->lstFields, 5, &err); // THIS LINE CAUSES CORRUPTION IN numEvents

Code does work

typeEvent * pEvent;   // use an auxiliar to iterate through the list
lstEvent_NewList(&lstEvent, 1, &err);
//...
pEvent = lstEvent_FirstNode(&lstEvent);
pEvent->numEvents = events;
//...
typeCommand * p = 0;
p = (typeCommand *) malloc(sizeof(typeCommand)); // do some more allocations
lstField_NewList(&p->lstFields, 5, &err); // all is good

So, basically if I use an auxiliar to get the first element of the list nothing happens. However, if I use the function to directly access the first element, it breaks later on in the program.

Structure and function definitions:

typeEvent * lstEvent_FirstNode (LSTEvent *lst)
{
    return lst->ls;
}

void lstEvent_NewList (LSTEvent * lst, uint16_t size, uint8_t * err)
{
    typeEvent* ret = 0;
    ret = (typeEvent*)malloc(sizeof(typeEvent)*size);
    if (ret == 0)
    {
        *err = E_RUN_OUT_OF_MEM;
        lst->cn = 0;
        return;
    }
    *err = 0;
    lst->ls = ret;
    lst->cn = size;
}    

void lstField_NewList (LSTField * lst, uint16_t size, uint8_t * err)
{
    typeField* ret = 0;
    ret = (typeField*)malloc(sizeof(typeField)*size);
    if (ret == 0)
    {
        *err = E_RUN_OUT_OF_MEM;
        lst->cn = 0;
        return;
    }
    *err = 0;
    lst->ls = ret;
    lst->cn = size;
}

void lstEvent_ClearList (LSTEvent * lst)
{
   free(lst->ls);
   lst->ls = 0;
   lst->cn = 0;
}    

void lstField_ClearList (LSTField * lst)
{
   free(lst->ls);
   lst->ls = 0;
   lst->cn = 0;
}

struct DM_Field{
    __packed uint8_t value;
};
typedef struct DM_Field typeField;

typedef struct LSTField {
    __packed uint16_t cn;   // Count, number of elements in array
    uint8_t * ls;           // Array  
} LSTField;

struct DM_Command{
    // some data
    LSTField lstFields;
};
typedef struct DM_Command typeCommand;

Platform: STM32L1XX.

EDIT: This code resembles closer the reality.

Code does not work

typeCommand * p = 0;
for (uint16_t i=0; i<1000; i++)
{
    // numEvents gets corrupted after some cycles
    lstEvent_NewList(&lstEvent, 1, &err); // creates a list of typeEvent in heap
    //...
    lstEvent_FirstNode(&lstEvent)->numEvents = events; // access first element of list directly
    //...
    p = (typeCommand *) malloc(sizeof(typeCommand)); // do some more allocations
    lstField_NewList(&p->lstFields, 5, &err); // THIS LINE (was thought to be causing) CORRUPTION IN numEvents
    //...
    lstEvent_ClearList(&lstEvent);
    lstField_ClearList(&p->lstFields);
    p = 0;
}

Code does work

typeCommand * p = 0;
typeEvent * pEvent;   // use an auxiliar to iterate through the list
for (uint16_t i=0; i<1000; i++)
{
    lstEvent_NewList(&lstEvent, 1, &err);
    //...
    pEvent = lstEvent_FirstNode(&lstEvent);
    pEvent->numEvents = events;
    //...
    p = (typeCommand *) malloc(sizeof(typeCommand)); // do some more allocations
    lstField_NewList(&p->lstFields, 5, &err); // all is good
    //...
    lstEvent_ClearList(&lstEvent);
    lstField_ClearList(&p->lstFields);
    free(p);  // freeing properly now
    p = 0;
}
Paul
  • 3
  • 5
  • 1
    [Please don't cast the return value of `malloc()` in C](http://stackoverflow.com/a/605858/28169). – unwind Oct 28 '15 at 11:23
  • 2
    I cannot see how your change would make a difference - according to the C standards, the two variants must do the same. Can you provide a SSCCE ( http://sscce.org/ ) to demonstrate the problem? – sleske Oct 28 '15 at 11:42
  • Provide a [mcve]. Do you have an interrupt running, generating a race condition? – too honest for this site Oct 28 '15 at 11:52
  • Thanks @unwind, I will remove that. – Paul Oct 28 '15 at 12:16
  • I will try to provide a SSCCE. I thought the whole code was too large to post it here, and reduced it to something manageable. I don't have any interrupts @Olaf. – Paul Oct 28 '15 at 12:17
  • Are you able to run this code outside of the embedded environment? If so, try running your code under valgrind. If you're stepping on memory you're not supposed to, it will tell you where. – dbush Oct 28 '15 at 12:19
  • @Paul: The whole point of an SSCCE is to avoid posting "the whole code". That's the "short" in "SSCCE" (or the "minimal" in "Minimal, Complete, and Verifiable") :-). – sleske Oct 28 '15 at 12:22
  • The behavior you describe is typical for any memory corruption bug. Somewhere in your program there is undefined behavior, and by shuffling around a few instructions, you somehow manage to temporarily suppress the bug. Most likely the memory corruption bug is inside the function where you do the malloc, but it is not necessarily related to malloc/heap. My guess is that the stack frame of the function gets corrupted by code which is writing to the wrong memory location somewhere. – Lundin Oct 28 '15 at 12:25
  • What happens if you remove the "__packed" attribute from the struct and try the original code? – Lundin Oct 28 '15 at 12:30
  • Also, what's the mysterious `uint8` that appears in your code? Is it defined? Why not use `uint8_t`? – Lundin Oct 28 '15 at 12:34
  • What is the definition of `typeEvent*`? What is `lstField_NewList()` SUPPOSED to do? Looking at the code, you're allocating 1 byte. When you call `lst_Event_FirstNode()` using the same pointer, you're casting your 1 byte allocation to a `typeEvent*`, then setting some member of it to a value. Unless that member is the first member and only a byte, you're overwriting something you shouldn't. Of course, the `//...` you've got in your code might mean you're working on a different allocation and this is a red herring. – Russ Schultz Oct 28 '15 at 14:04
  • Apologies to all for posting an inconcrete piece of code. @Lundin `uint8` is a type error, should be `uint8_t`; and thanks for the guess, I also believe something like that is happening. I`m working on a better example code, coming soon... – Paul Oct 28 '15 at 14:45
  • @Paul Type as in typing? If so, please post a copy/paste of the actual code instead. – Lundin Oct 28 '15 at 14:46
  • 1
    While looking for a SSCCE I've found the real problem. The memory taken by `p` was not being freed, equaling the pointer to zero in each loop instead. For some reason, as @Lundin suggested, when changing the code using `pEvent` auxiliar the problem ceased to show up. But the bug was still there. – Paul Oct 28 '15 at 15:30
  • Apologies again for posting such a messy code. I will take greater care next time. Thanks all for your comments. – Paul Oct 28 '15 at 15:31

0 Answers0