gdb watch huge amount of memory to find out corruption, no seg fault here

Question

Updated: now with valgrind --tools=memcheck --track-origins=yes --leak-check=full ./prog it runs correctly, but without this valgrind, it still goes wrong, how's that happen?

I'm doing a project on Linux, which stores lots of data in memory, and I need to know which data block is changed in order to find out the problem in my program.

Updated: This is a multithread program, and the write/read is done by different threads which created by system calls.

The code is like this

for(j=0;j<save_size;j++){
    e->blkmap_mem[blk_offset+save_offset + j] = get_mfs_hash_block();
    memcpy(e->blkmap_mem[blk_offset + save_offset +j]->data, (char *)buff + j * 4096, 4096);
    e->blkmap_mem[save_offset+j]->data = (char *)(buff + j* 4096);
    e->blkmap_mem[blk_offset+save_offset + j]->size = 4096;
    e->blkmap_addr[blk_offset+save_offset + j] = 1;

And I want to know if e->blkmap_mem[blk_offset+save_offset+j]->data is changed in somewhere else.

I know awatch exp in gdb could check if the value changes, but there are too many here, is there some way to trace them all, I mean they may be nearly 6,000.

Thanks your guys.

By changed somewhere else, do you mean corrupted by a bad write? Or you mean the change might be legit? — FatalError, Mar 19 '12 at 13:43
yes, I don't know if I incidentally overwrite it. It should be write only once but now it is corrupted. — bxshi, Mar 19 '12 at 13:52
In that case, I suggest you check out `valgrind`. Run your program through it and see if it spits out anything interesting. — FatalError, Mar 19 '12 at 13:56
I use `valgrind ./my_prog` to run my program, there is no mem leak, but it still goes wrong. — bxshi, Mar 19 '12 at 14:08
now with `valgrind --tools=memcheck --track-origins=yes --leak-check=full ./prog` it runs correctly, but without this `valgrind`, it still goes wrong, how's that happen? — bxshi, Mar 19 '12 at 14:24

Timothy Jones · Answer 1 · 2016-08-08T01:04:48.950

7

Reverse debugging has a great use case here, assuming you have some way to detect the corruption once it's happened (a seg fault will do fine).

Once you've detected the corruption in a debugging session, you put a watch point on the corrupted variable, and then run the program backwards until the variable was written to.

Here's a step-by-step guide:

Compile the program with debugging symbols as usual and load it into gdb.
Start the program using start.
- This puts a breakpoint at the very beginning of main, and runs the program until it hits it.
Now, put a breakpoint somewhere where memory corruption is detected
- You don't need to do this if you're detecting the corruption with a seg fault.
type record to start recording program execution
- This is why we called start before - you can't record when there's no process running.
continue to set the program running again.
- While recording, the program will run very slowly
- It may tell you the record buffer is full - if this happens, tell it to wrap around.
When your corruption is detected by your breakpoint or the seg fault, the program will stop. Now put a watch on whatever the corrupted variable is.
reverse-continue to run the program backwards until the corrupted variable is written to.
When the watchpoint hits, you've found your corruption.
- Note that it's not always the first or only corruption of that variable. But you can always keep running backwards until you run out of reverse execution history - and now you've got something to fix.

There's a useful tutorial here, which also discusses how to control the size of the record buffer, in case that becomes an issue for you.

edited Aug 08 '16 at 01:04

answered Mar 19 '12 at 14:12

Timothy Jones

21,495
6
60
90

The thing is, there is no seg fault or something else could shows me that where is the problem. It runs okay, just with the wrong result. Really disappointing – bxshi Mar 19 '12 at 14:17
Put an `if` statement in a sensible place that would only trigger if the variable was corrupted. Then put your breakpoint in there. Or, put the breakpoint at the end of the program and just watch one of the incorrect variables before reverse-continuing. – Timothy Jones Mar 19 '12 at 14:20
by the way, my program is a multithread one, could that still recorded correctly? This read/write is not done in one thread, and these threads are created by system calls. – bxshi Mar 19 '12 at 14:32
I've never tried, but it [looks like it should work](http://stackoverflow.com/questions/7517236/how-do-i-enable-reverse-debugging-on-a-multi-threaded-program). You should edit your question to include the multi-threaded nature of your program - from your success with valgrind, sounds like it might be a race condition. – Timothy Jones Mar 19 '12 at 14:36
I'm trying to check race condition by helgrind, it runs really slow, till now there is not race condition warning appears. Is there any other way to check that, I do not want to review my entire code once again. – bxshi Mar 19 '12 at 14:55
Any suggestion on how to do this in Android? gdb says that the record and reverse-continue commands are not supported by the platform. – amfcosta Jun 21 '13 at 21:13

gdb watch huge amount of memory to find out corruption, no seg fault here

1 Answers1