3

I usually love good explained questions and answers. But in this case I really can't give any more clues.

The question is: why malloc() is giving me SIGSEGV? The debug bellow show the program has no time to test the returned pointer to NULL and exit. The program quits INSIDE MALLOC!

I'm assuming my malloc in glibc is just fine. I have a debian/linux wheezy system, updated, in an old pentium (i386/i486 arch).

To be able to track, I generated a core dump. Lets follow it:

iguana$gdb xadreco core-20131207-150611.dump

Core was generated by `./xadreco'.
Program terminated with signal 11, Segmentation fault.
#0  0xb767fef5 in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0  0xb767fef5 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1  0xb76824bc in malloc () from /lib/i386-linux-gnu/libc.so.6
#2  0x080529c3 in enche_pmovi (cabeca=0xbfd40de0, pmovi=0x...) at xadreco.c:4519
#3  0x0804b93a in geramov (tabu=..., nmovi=0xbfd411f8) at xadreco.c:1473
#4  0x0804e7b7 in minimax (atual=..., deep=1, alfa=-105000, bet...) at xadreco.c:2778
#5  0x0804e9fa in minimax (atual=..., deep=0, alfa=-105000, bet...) at xadreco.c:2827
#6  0x0804de62 in compjoga (tabu=0xbfd41924) at xadreco.c:2508
#7  0x080490b5 in main (argc=1, argv=0xbfd41b24) at xadreco.c:604
(gdb) frame 2
#2  0x080529c3 in enche_pmovi (cabeca=0xbfd40de0, pmovi=0x ...) at xadreco.c:4519
4519        movimento *paux = (movimento *) malloc (sizeof (movimento));
(gdb) l
4516 
4517    void enche_pmovi (movimento **cabeca, movimento **pmovi, int c0, int c1, int c2, int c3, int p, int r, int e, int f, int *nmovi)
4518    {
4519        movimento *paux = (movimento *) malloc (sizeof (movimento));
4520        if (paux == NULL)
4521            exit(1);

Of course I need to look at frame 2, the last on stack related to my code. But the line 4519 gives SIGSEGV! It does not have time to test, on line 4520, if paux==NULL or not.

Here it is "movimento" (abbreviated):

typedef struct smovimento
{
    int lance[4];  //move in integer notation
    int roque; // etc. ...

    struct smovimento *prox;// pointer to next
} movimento;

This program can load a LOT of memory. And I know the memory is in its limits. But I thought malloc would handle better when memory is not available.

Doing a $free -h during execution, I can see memory down to as low as 1MB! Thats ok. The old computer only has 96MB. And 50MB is used by the OS.

I don't know to where start looking. Maybe check available memory BEFORE a malloc call? But that sounds a wast of computer power, as malloc would supposedly do that. sizeof (movimento) is about 48 bytes. If I test before, at least I'll have some confirmation of the bug.

Any ideas, please share. Thanks.

DrBeco
  • 11,237
  • 9
  • 59
  • 76
  • 1
    Du you have any large allocations on the stack in some of the functions prior to the one which SIGSEGVs? The stack has a limitied size, a program can crash this way if it's exceeded. – Atle Dec 07 '13 at 19:03
  • 1
    Things like this can be the result of a overwrite in an memory area that is used by malloc internally. This could be an off-by-one index or an access via a stale pointer. – wildplasser Dec 07 '13 at 19:09
  • @Atle Yes, I have. But I am very cautious to check every new allocation. I want at least to be able to printf("memory full\n"), and not get a SIGSEGV like this. How would I keep track of the free stack size before it crashes? Thanks. – DrBeco Dec 07 '13 at 19:12
  • @wildplasser In this case, as you can see, the code is in the start of a function. Its a local pointer, brand new, and malloc is the first command in that function. How would I check that with some tests? – DrBeco Dec 07 '13 at 19:14
  • If you allocate on the stack, you know how much you're allocating. You can run `ulimit -a | grep stack` to see what the limit is. If you wan't to find out if this is the reason for the crash, you should try replacing some big stack allocations with `malloc()` and see if it works. To check if you are corrupting memory somplace, like @wildplasser suggest, use `valgrind`. – Atle Dec 07 '13 at 19:16
  • I'll try valgrind. But just to clear: I misunderstood you. I do not allocate things directly on stack. I'm using only malloc. Thanks. If you guys have any more tips, I'll be glad to hear. As soon as I try, Ill keep this updated, till we find the bug. – DrBeco Dec 07 '13 at 19:20
  • `the code is in the start of a function` Totally irrelevant. malloc() segfaults because you damaged its bookkeeping structures (typically before the beginning and/or after the end of the allocated chunks. – wildplasser Dec 07 '13 at 19:23
  • you **need not** and **should not** cast the return value of malloc. check [this](http://stackoverflow.com/q/605845/2173917). – Sourav Ghosh Dec 07 '13 at 19:28
  • 1
    It's unlikely that stack usage has anything to do with this crash. And the `free` command has nothing to do with actual memory availability; its only purpose is showing you cache utilization efficiency and similar. The crash is almost certainly caused by a heap-based buffer overflow. – R.. GitHub STOP HELPING ICE Dec 07 '13 at 19:49
  • 1
    @SouravGhosh Do you really think avoiding the cast to `malloc()` will help solve this problem? In this case, I'd strong suggest _avoiding_ changing such casts until _after_ the problem is solved. OP could have hundreds (line of code >= 4521) of such casts and changing them now would simply not be productive. – chux - Reinstate Monica Dec 07 '13 at 20:18
  • @R.. Hello R. Would you be kind to elaborate on that "heap-based buffer overflow" so I'll have more food for thought? (and more to research)? Thanks a lot. – DrBeco Dec 08 '13 at 05:04
  • @chux avoiding the cast might not solve the problem, but will surely improve the code standard. I understand this is not the answer to the problem [that is why this is a comment instead of an answer], but is mentioning this completely irrelevant? – Sourav Ghosh Dec 08 '13 at 10:58
  • @SouravGhosh It is not relevant to the post. signal 11 SIGSEGV is certainly not a `malloc()` casting issue. The bold text to bring attention to it was distracting. – chux - Reinstate Monica Dec 08 '13 at 19:36
  • Hi @SouravGhosh, I don't think its irrelevant. But for the moment, its misleading. I could run into serious problems if I was naive to change the code while debuging such complex problem. After solving this problem, I'll consider your suggestion as an improvment. Thanks. – DrBeco Dec 12 '13 at 01:17

1 Answers1

4

Any crash inside malloc (or free) is an almost sure sign of heap corruption, which can come in many forms:

  • overflowing or underflowing a heap buffer
  • freeing something twice
  • freeing a non-heap pointer
  • writing to freed block
  • etc.

These bugs are very hard to catch without tool support, because the crash often comes many thousands of instructions, and possibly many calls to malloc or free later, in code that is often in a completely different part of the program and very far from where the bug is.

The good news is that tools like Valgrind or AddressSanitizer usually point you straight at the problem.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362