0

I'm observing some weird behaviour from a relatively simple piece of code and would like to ask if anybody has seen such before.

The code below is a basic example for allocating memory inside a function and returning a pointer to it (you can find similar in most C textbooks).

// BROKEN CODE EXAMPLE:
// Function which returns a char pointer
char *f() {
    // Allocate some memory; 1024 is merely an example
    char *c = malloc(1024);
    return c;
}

// Calling function
void main_f() {
    char *c = f();
    // Do some stuff here
    free(c);
}

In my actual program the main_f() function is repeatedly called from its main loop. Everything works as expected when the main_f() function is compiled into the program's executable or comes from a library that the executable is linked to.

However, if the code resides in a shared library that the executable loads using dlopen(), a strange thing happens: after a random number of iterations (which may be 5, 15, 50 or even more) the code crashes with SIGSEGV. Debugging the code reveals that the crash happens exactly when return is called.

Adding to the weirdness is that, found by trial and error, a simple cure for the crash is not to return a pointer from the f() function, but supply the function with a double pointer:

// WORKING CODE EXAMPLE:
// Modify the function to use a double pointer
void f(char **c) {
    // Allocate some memory; 1024 is merely an example
    *c = malloc(1024);
}

// Calling function
void main_f() {
    char *c;
    f(&c);
    // Do some stuff here
    free(c);
}

The actual code is part of a GTK+ program and runs inside the main loop of GLib. The said program does not create additional threads but executes the above stanza once a minute via timer (and runs do not overlap). The shared library is loaded only once via dlopen() at init time.

Could it be that malloc(), dlopen() and GLib do not always play nice? The actual program is UNIX-only, so we saw no need to resort to GLib-provided portable functions like g_malloc() or larger objects like GModule; would it make any sense to prefer them though?

Has anybody else seen this issue?

assen.totin
  • 41
  • 1
  • 8
  • 2
    Use `int main(void)` instead of `void main()` – David Ranieri Feb 04 '15 at 10:09
  • Is the shared library compiled with the `-fPIC` parameter? Have a look at http://stackoverflow.com/questions/5311515/gcc-fpic-option – Sebastian Stigler Feb 04 '15 at 10:14
  • Do I correctly understand that in case of the shared library `malloc` is called inside the shared object and `free` inside the main executable? If so, then at least on Windows this is a no go (not sure about UNIX though), try calling `free` in the same heap where `malloc` is called. – Rudolfs Bundulis Feb 04 '15 at 10:36
  • @AlterMann `main_f()` is not necessarily the `main()` function; as I explained, the code above is an example. On a side note, `void main()` is both perfectly legal and irrelevant to the issue discussed here. – assen.totin Feb 04 '15 at 10:36
  • 2
    No, `void main` is not legal, from the standard: The function called at program startup is named main. The implementation declares no prototype for this function. **It shall be defined with a return type of int** and with no parameters or with two parameters (referred to here as argc and argv) – David Ranieri Feb 04 '15 at 10:40
  • @RudolfsBundulis No, `free()` is also called in the shared library's code (and while the library is still opened). Also, the error happens at `return`, not at `free()`. – assen.totin Feb 04 '15 at 10:40
  • @SebastianStigler Yes, the code for the shared library is linked with `-fPIC`. I use `libtool` which takes care of this. – assen.totin Feb 04 '15 at 10:46
  • "The crash happens when return is called" but when exactly? It might help to pinpoint the exact assembler instruction that causes the segfault. – nwellnhof Feb 04 '15 at 11:29
  • 1
    @assen.totin can you please ensure that you post the **exact code** which has been observed to show the problem (along with the compiler's build string for each file in this case) . In creating a simplified example you may have left out some important detail. Creating a simplified example is good but you must also check that the simplified example still exhibits the problem. – M.M Feb 04 '15 at 11:37
  • 2
    "*Debugging the code reveals that the crash happens exactly when return is called.*" this strongly smells like a memory corruption, that happend (long?) before the actual crash. You might like to consider running the whole program using a memory-(access-)checker like Valgrind (http:/valgrind.org). – alk Feb 04 '15 at 12:03

1 Answers1

1

Short answer, you either have a bug OR bad hardware. I strongly suspect a bug. If a bug, you have either a pointer that's run amuck and trashed your code or you're improperly free'ing memory.

I recognize you provided a simplified example to demonstrate what was happening, but even in your example there is a bug. You malloc but never test for error. If you later on try to free NULL, this could be the cause.

That fact that you can make a minor modification to your code that mysteriously solving the problem only reinforces the bug theory.

However, if you are 100% sure there is no bug (and then you check again and are REALLY sure there's no bug), then your hardware problem is likely just bad ram. Try running your code on another machine. If it crashes there -- then it's a bug. Otherwise a simple RAM swap should confirm if you have h/w failure.

user590028
  • 11,364
  • 3
  • 40
  • 57
  • 2
    `free(NULL)` is well-defined and does nothing. (Perhaps null pointers are dereferenced before being freed though, which would cause a problem) – M.M Feb 04 '15 at 11:36
  • Of course, the actual code has proper validation of malloc()'s return value and in case of failure nothing is ever attempted with this pointer. But I'll keep digging, thanks. – assen.totin Feb 04 '15 at 12:06