2

I've set "ulimit -c unlimited" on my Fedora system so segfaults generate core dump files. This is working.

I've seen an NT_FILE note mentioned at these URLs:

ELF core file format

Anatomy of an ELF core file

But my core files only contain these notes:

$ readelf --notes core.simple.11

Notes at offset 0x000003f8 with length 0x00000558:
  Owner     Data size   Description
  CORE      0x00000150  NT_PRSTATUS (prstatus structure)
  CORE      0x00000088  NT_PRPSINFO (prpsinfo structure)
  CORE      0x00000130  NT_AUXV (auxiliary vector)
  CORE      0x00000200  NT_FPREGSET (floating point registers)

Why is there no NT_FILE note? How can I figure out the various object files the core file may be based on, and more importantly, the virtual addresses where those files were mapped into the core image?

Without the address mapping information from the NT_FILE note, I don't know how I can perform code address lookups in the DWARF debugging info from the object files.

Program headers in the core file:

$ readelf --segments core.simple.11

Elf file type is CORE (Core file)
Entry point 0x0
There are 17 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  NOTE           0x00000000000003f8 0x0000000000000000 0x0000000000000000
                 0x0000000000000558 0x0000000000000000         0
  LOAD           0x0000000000001000 0x0000000000400000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  R E    1000
  LOAD           0x0000000000002000 0x0000000000600000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  RW     1000
  LOAD           0x0000000000003000 0x00000035fe800000 0x0000000000000000
                 0x0000000000001000 0x000000000001e000  R E    1000
  LOAD           0x0000000000004000 0x00000035fea1d000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  R      1000
  LOAD           0x0000000000005000 0x00000035fea1e000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  RW     1000
  LOAD           0x0000000000006000 0x00000035fea1f000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  RW     1000
  LOAD           0x0000000000007000 0x00000035fec00000 0x0000000000000000
                 0x0000000000001000 0x0000000000173000  R E    1000
  LOAD           0x0000000000008000 0x00000035fed73000 0x0000000000000000
                 0x0000000000000000 0x00000000001ff000         1000
  LOAD           0x0000000000008000 0x00000035fef72000 0x0000000000000000
                 0x0000000000004000 0x0000000000004000  R      1000
  LOAD           0x000000000000c000 0x00000035fef76000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  RW     1000
  LOAD           0x000000000000d000 0x00000035fef77000 0x0000000000000000
                 0x0000000000005000 0x0000000000005000  RW     1000
  LOAD           0x0000000000012000 0x00007fc22db59000 0x0000000000000000
                 0x0000000000003000 0x0000000000003000  RW     1000
  LOAD           0x0000000000015000 0x00007fc22db6c000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  RW     1000
  LOAD           0x0000000000016000 0x00007fff81c40000 0x0000000000000000
                 0x0000000000016000 0x0000000000016000  RW     1000
  LOAD           0x000000000002c000 0x00007fff81dee000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  R E    1000
  LOAD           0x000000000002d000 0xffffffffff600000 0x0000000000000000
                 0x0000000000000000 0x0000000000001000  R E    1000

Program headers in the executable file:

$ readelf --segments simple

Elf file type is EXEC (Executable file)
Entry point 0x400390
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001c0 0x00000000000001c0  R E    8
  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000674 0x0000000000000674  R E    200000
  LOAD           0x0000000000000678 0x0000000000600678 0x0000000000600678
                 0x00000000000001e4 0x00000000000001f8  RW     200000
  DYNAMIC        0x00000000000006a0 0x00000000006006a0 0x00000000006006a0
                 0x0000000000000190 0x0000000000000190  RW     8
  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x00000000000005a8 0x00000000004005a8 0x00000000004005a8
                 0x000000000000002c 0x000000000000002c  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06     .eh_frame_hdr
   07

My Linux version:

$ uname -a
Linux somehost 2.6.32.23-170.fc12.x86_64 #1 SMP Mon Sep 27 17:23:59 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
jps
  • 20,041
  • 15
  • 75
  • 79
barryd
  • 317
  • 4
  • 12
  • 1
    The NT_FILE note was added to the kernel in 2012, so a kernel built in 2010 won't have it. Can you get a newer kernel? – Mark Plotnick Sep 29 '16 at 14:33
  • Upgrading the kernel on our production systems where I need this to work would not be taken lightly I suspect. But I figure if gdb doesn't need the NT_FILE note to associate the rip register and other code addresses with entries in the .eh_frame and .debug_line tables, then there must be another way to work out the mapping of the object files to the core image address space. I just have to find out what it is. – barryd Sep 29 '16 at 15:27
  • While updating the kernel for our current production systems is probably out of the question, we are setting up new production systems which will be based on a recent CentOS release, I believe. Once they've been proven, we'll move our operations across. So we won't be stuck with an old kernel forever. Just FYI :) – barryd Oct 01 '16 at 10:11

1 Answers1

5

Why is there no NT_FILE note?

As noted by Mark Plotnick, it's a fairly recent kernel addition.

In no way is NT_FILE note required for GDB (in fact, current GDB doesn't appear to use NT_FILE at all, except when writing a core file with gcore command).

How can I figure out the various object files the core file may be based on, and more importantly, the virtual addresses where those files were mapped into the core image?

The way this works for GDB is to look at PT_DYNAMIC for the main executable in the core, extract DT_DEBUG from that, which then gives it a pointer to _r_debug, which includes a linked list r_map of struct link_map, with each node in the list describing a loaded ELF file.

The (gdb) info shared command will show you the decoded version of above info, but you need to provide matching binaries: the core alone does not contain sufficient info.

Now, your question isn't very clear and could be understood several different ways.

It could be: "I have a core, which application crashed?" Use file core and hope that the first 16 characters of pathname are sufficient. If that isn't sufficient, running strings core will often reveal which application produced it. You should also consider setting /proc/sys/kernel/core_pattern to something that includes %e or %E, so the question is trivial to answer in the future.

It could be: "I have several versions of application foo, and want to know which version of foo produced this particular core". In that case, you should be linking foo with -Wl,--build-id linker flag. That flag creates NT_GNU_BUILD_ID note in the foo binary. That note survives strip, and is saved inside the core file as well. You can then run eu-unstrip -n --core /path/to/core, and that will produce output like this:

eu-unstrip -n --core core
0x400000+0x208000 c266a51e4b85b16ca17bff8328f3abeafb577b29@0x400284 - - [exe]
0x7ffe3f7d9000+0x1000 7f14688f101a2ace5cad23dfbfbc918616651576@0x7ffe3f7d9340 . - linux-vdso.so.1
0x7fb5b6ec3000+0x2241c8 d0f537904076d73f29e4a37341f8a449e2ef6cd0@0x7fb5b6ec31d8 /lib64/ld-linux-x86-64.so.2 /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.19.so ld-linux-x86-64.so.2
0x7fb5b6afe000+0x3c42c0 cf699a15caae64f50311fc4655b86dc39a479789@0x7fb5b6afe280 /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.19.so libc.so.6

From above output, you can know both exactly which ELF binaries were used, and where in memory they were loaded.

P.S. I just tried dumping a core from a.out built with -Wl,--build-id=none, and the resulting eu-unstrip output is still quite useful:

eu-unstrip -n --core core
0x400000+0x202000 - - - [exe]
0x7fff5e1a0000+0x1000 7f14688f101a2ace5cad23dfbfbc918616651576@0x7fff5e1a0340 . - linux-vdso.so.1
0x7fbda432d000+0x2241c8 d0f537904076d73f29e4a37341f8a449e2ef6cd0@0x7fbda432d1d8 /lib64/ld-linux-x86-64.so.2 /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.19.so ld-linux-x86-64.so.2
0x7fbda3f68000+0x3c42c0 cf699a15caae64f50311fc4655b86dc39a479789@0x7fbda3f68280 /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.19.so libc.so.6

Update:

There is no PT_DYNAMIC program header in my core file itself,

No, but PT_DYNAMIC is a writable segment @0x6006a0. That segment is actually written to (by the dynamic loader), and thus is always saved in the core (as other modified data).

In your case, the contents is in the PT_LOAD segment @0x600000 (i.e. the segment at file offset 0x2000 in the core).

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • "look at PT_DYNAMIC for the main executable in the core" - There is no PT_DYNAMIC program header in my core file itself, but do you mean I should find part of the executable file embedded in the core file and parse the program headers from that? I'll see about adding a program header dump from my core file to the question. – barryd Oct 01 '16 at 03:45
  • Can I assume it will always be loaded at the vaddr indicated in the executable? ie. Is there no possibility for relocation of this particular data? How can I find it if I haven't identified the executable yet? – barryd Oct 01 '16 at 06:30
  • @barryd A non-`PIE` executable is *always* loaded at linked-at address (it wouldn't work otherwise). I don't know how GDB finds the dynamic segment for `PIE` binaries. You can't find the data without the executable (as I said already). – Employed Russian Oct 01 '16 at 06:48
  • We have been using the psargs data (which "file" gives) and also using %e in the core_pattern. The %e has a rather short length limit and doesn't include a path. psargs is long enough that our path/filename probably won't get truncated, though the arguments will. It should be good enough but I thought there might be a more robust way to determine the executable. I guess build-id is the other option. – barryd Oct 01 '16 at 07:50
  • Do you know how eu-unstrip gets the build-ids & load addresses out from the core file, without being given the executable? – barryd Oct 01 '16 at 08:10
  • @barryd `eu-unstrip` is open source. You can read it to find out. Or ask a separate question, and I can research it for you. (I have a guess of how it works, and I am 80% sure my guess is correct, but I prefer better certainty when giving answers ;-) – Employed Russian Oct 01 '16 at 15:44
  • I'd already downloaded (but not read) the source, but was hoping either 1. it's not as difficult as the gdb source or 2. you already knew how it works off the top of your head. I've now learned relocation is rarer than I thought. It looks like none of my libs are being relocated, and our executables are not PIE, so I don't think I'll have to deal with it after all. – barryd Oct 01 '16 at 20:26
  • @barryd The libraries not being relocated is surprising. Most likely it's because you have prelink enabled. Beware that they may randomly move to a different location the next time prelink runs. – Employed Russian Oct 01 '16 at 22:45
  • Is it still surprising if I only checked with a test program with only 2 libs (libc & ld-linux)? Maybe relocation is necessary due to address conflict on typical executables with many libs, but not with this one? Or should relocation happen regardless of need? That said, I will need this working on real programs so maybe I will have to handle relocation after all. – barryd Oct 02 '16 at 01:13
  • If we are using prelink, I should be fine. We mostly intend to analyse cores right after production, so no chance for prelink to change addresses in between. Plus if prelink removes the need to relocate at runtime, it also removes my need to translate addresses between the executables and the core. – barryd Oct 02 '16 at 01:19