5

I have a core file generated from a segfault. When I try to load it into gdb, it doesn't appear to matter how I load it or if I use the correct executable or not - I always get this warning from gdb about the core file being truncated:

$ gdb -q /u1/dbg/bin/exdoc_usermaint_pdf_compact /tmp/barry/core.exdoc_usermaint.11
Reading symbols from /u1/dbg/bin/exdoc_usermaint_pdf_compact...done.
BFD: Warning: /tmp/barry/core.exdoc_usermaint.11 is truncated: expected core file size >= 43548672, found: 31399936.

warning: core file may not match specified executable file.
Cannot access memory at address 0x7f0ebc833668
(gdb) q

I am concerned with this error: "BFD: Warning: /tmp/barry/core.exdoc_usermaint.11 is truncated: expected core file size >= 43548672, found: 31399936."

Why does gdb think the core file is truncated? Is gdb right? Where does gdb obtain an expected size for the core file, and can I double-check it?

Background:

I am attempting to improve our diagnosis of segfaults on our production systems. My plan is to take core files from stripped executables in production and use them with debug versions of the executables on our development system, to quickly diagnose segfault bugs. In an earlier version of this question I gave many details related to the similar-but-different systems, but I have since been granted an account on our production system and determined that most of the details were unimportant to the problem.

gdb version:

$ gdb
GNU gdb (GDB) Fedora (7.0.1-50.fc12)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.

Linux version:

$ uname -a
Linux somehost 2.6.32.23-170.fc12.x86_64 #1 SMP Mon Sep 27 17:23:59 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

I read this question (and many others) before posting. The asker has somewhat similar goals to myself, but is not getting any error from gdb about a truncated core file. Therefore the information related to that question does not help me with my problem.

jps
  • 20,041
  • 15
  • 75
  • 79
barryd
  • 317
  • 4
  • 12
  • I'm basically finished for the week but on Monday I'll be struggling to get more tests done and get this working before I have to abort the project. – barryd Sep 16 '16 at 12:52
  • Is the core file size limited on the production box?http://dotmanila.com/2013/09/limit-the-size-of-your-core-files-on-linux/ – Matthew Fisher Sep 16 '16 at 13:16
  • After googling a bit more, I'm beginning to believe the core file might actually be truncated. Not sure how though. We have core_pattern set to a filename, not a program, so it should be pretty reliable. And I thought the ulimit for core files just stops them being generated at all if they're too big. – barryd Sep 16 '16 at 13:31
  • @MatthewFisher Thanks for the tip. I'll definitely look into this on Monday. I didn't really think the files really were truncated but it makes sense, especially if I got the way ulimit works wrong. And I see the sizes might not be what I expect based on the limits.conf entry. – barryd Sep 16 '16 at 13:37
  • My question has been marked as a duplicate. Incorrectly, as far as I can tell. Though if you only read the title, it does appear similar to the supposed duplicate. I had read that question prior to posting this one, and had it been a duplicate or had the information addressed my problem at all, I would not have posted my own question. I am proceeding to edit the question as suggested in the recommendations for dealing with the "duplicate" flag. As I simplify and incorporate new information, the changes may appear somewhat severe. – barryd Sep 19 '16 at 09:04
  • To summarize what I've learned: gdb's failure to read the core file was entirely unrelated to the fact it came from a different system. The core file was truncated. I was able to confirm this myself after researching the basics of the ELF format as used for core dumps. It currently seems that a mis-configured limits.conf core size setting was the cause of the truncation, though there is still some inconsistency with that hypothesis and there appears to be at least 1 more cause for truncation in play, since after setting the limit to "unlimited", some core files are still truncated. – barryd Sep 19 '16 at 09:11
  • I would like to answer my own question (or let MatthewFisher answer if he is able to provide enough detail). I suppose stackoverflow won't let me because it's marked as a duplicate? I've edited my question significantly, so I suppose the appropriate action now is to wait? – barryd Sep 19 '16 at 09:58
  • This question is clearly -not- a duplicate of the one indicated. – Matthew Fisher Sep 19 '16 at 12:46
  • Setting the ulimits can be system-wide or local to a shell, be sure to change it system-wide so that all core will be treated the same. – Matthew Fisher Sep 19 '16 at 12:47
  • @MatthewFisher Thanks for the support. Re. the core size, we set it in /etc/security/limits.conf, and with a * for the user so it applies to everyone. The 1 example that was truncated since then should have been 1.8gb but was cut down to 15mb. Honestly, I'm not sure we want a 1.8gb core file taking space on our prod system, so I'm okay with that. I suspect there might be a hard upper limit for the core size, so unlimited might effectively be a bit less than unlimited. – barryd Sep 20 '16 at 09:04
  • The "inconsistencies in the hypothesis" I referred to in a prior comment are: With the core size set to 20000, a couple of core files were truncated to about 30mb (from around 40/50mb). Another was truncated to 15mb (from 1.8gb). I checked if the file was cut off at a boundary between sections in the core file, but it was not. The cut was mid-section. Also, I noted /tmp (where we dump the cores) has an allocation unit size of 4kb. I don't see any pattern here. – barryd Sep 20 '16 at 09:10
  • Is your whole core file written to file? Could you check: `$ ulimit -a core file size (blocks, -c) 0` – gbajson Sep 28 '16 at 07:19
  • @gbajson Please excuse the formatting in a comment, but ulimit -a gives: core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 15974 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 ... (more deleted) – barryd Sep 29 '16 at 09:53
  • @EmployedRussian Hey, you flagged my question as a duplicate a week ago. Can you please take another look? I edited it to make the main problem clearer, but I don't think it was ever really a duplicate. – barryd Sep 29 '16 at 11:30

2 Answers2

6

The Core Dump File Format

On a modern Linux system, core dump files are formatted using the ELF object file format, with a specific configuration. ELF is a structured binary file format, with file offsets used as references between data chunks in the file.

For core dump files, the e_type field in the ELF file header will have the value ET_CORE.

Unlike most ELF files, core dump files make all their data available via program headers, and no section headers are present. You may therefore choose to ignore section headers in calculating the size of the file, if you only need to deal with core files.

Calculating Core Dump File Size

To calculate the ELF file size:

  1. Consider all the chunks in the file:
    • chunk description (offset + size)
    • the ELF file header (0 + e_ehsize) (52 for ELF32, 64 for ELF64)
    • program header table (e_phoff + e_phentsize * e_phnum)
    • program data chunks (aka "segments") (p_offset + p_filesz)
    • the section header table (e_shoff + e_shentsize * e_shnum) - not required for core files
    • the section data chunks - (sh_offset + sh_size) - not required for core files
  2. Eliminate any section headers with a sh_type of SHT_NOBITS, as these are merely present to record the position of data that has been stripped and is no longer present in the file (not required for core files).
  3. Eliminate any chunks of size 0, as they contain no addressable bytes and therefore their file offset is irrelevant.
  4. The end of the file will be the end of the last chunk, which is the maximum of the offset + size for all remaining chunks listed above.

If you find the offsets to the program header or section header tables are past the end of the file, then you will not be able to calculate an expected file size, but you will know the file has been truncated.

Although an ELF file could potentially contain unaddressed regions and be longer than the calculated size, in my limited experience the files have been exactly the size calculated by the above method.

Truncated Core Files

gdb likely performs a calculation similar to the above to calculate the expected core file size.

In short, if gdb says your core file is truncated, it is very likely truncated.

One of the most likely causes for truncated core dump files is the system ulimit. This can be set on a system-wide basis in /etc/security/limits.conf, or on a per-user basis using the ulimit shell command [footnote: I don't know anything about systems other than my own].

Try the command "ulimit -c" to check your effective core file size limit:

$ ulimit -c
unlimited

Also, it's worth noting that gdb doesn't actually refuse to operate because of the truncated core file. gdb still attempts to produce a stack backtrace and in your case only fails when it tries to access data on the stack and finds that the specific memory locations addressed are off the end of the truncated core file.

barryd
  • 317
  • 4
  • 12
  • 3
    Thanks for the help, future self! Oh wait, I *am* future self now. That means I'm talking to myself again, goddammit! – barryd Oct 04 '16 at 09:57
1

highlighting an answer from stackoverflow that is used to address similar issue How to get core file greater than 2GB. As per the author the truncate or overwrite issue is resolved by making changes to default /etc/systemd/coredump.conf

Vipin Varghese
  • 4,540
  • 2
  • 9
  • 25