3

I am trying to get offsets/virtual addresses, strings in .rodata and .rodata1 sections.

For example:

#include <cstdio>

void myprintf(const char* ptr) {
        printf("%p\n", ptr);
}

int main() {
        myprintf("hello world");
        myprintf("\0\0");
        myprintf("ab\0cde");
}

Above program has .rodata per readelf -a's output:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [16] .rodata           PROGBITS         0000000000400600  00000600

And readelf -W -p .rodata gives me the offsets and the associated non null strings:

String dump of section '.rodata':
  [    10]  %p^J
  [    14]  hello world
  [    23]  ab
  [    26]  cde

I would like to write a C or C++ code to retrieve:

  1. The offsets of all the string literals (e.g. 10, 14, 23 above and the missing one for "\0\0")

  2. The string literals (e.g. "%p\n", "hello wolrd", "\0\0" above)

  3. The offset to the file for .rodata (e.g. 400600 above; is it guaranteed to be the virtual memory address? At least I see it is the case for all the string literal in my test code above.)

Because my end goal is to write a C/C++ code to read in an executable and accept user's input as the offset/virtual memory address, if the input matches the offset/virtual memory address of any string literal, then use printf() to print it out. Otherwise, ignore. (Thanks @Armali for helping me clarify)

I have read this article. I am able to access the entire string table in .rodata but not "string table indexes". The article mentions "string table indexes" but it doesn't specify how to retrieve indexes.

Hints?

Also, I wonder why there could be a section called .rodata1. According to elf manpage:

.rodata1

This section holds read-only data that typically contributes to a nonwritable segment in the process image. This section is of type SHT_PROGBITS. The attribute used is SHF_ALLOC.

It has exactly the same description as .rodata. Then, why do we have .rodata1?

Thanks!

HCSF
  • 2,387
  • 1
  • 14
  • 40
  • Is parsing the output of `system("readelf -W -p .rodata")` and `system("readelf -a | grep .rodata | awk '{print $4}'");` not enough? – KamilCuk Aug 19 '18 at 20:56
  • 1
    @KamilCuk I came across a different output from readelf -a online. So I think it is safer to parse based on ELF via elf.h. – HCSF Aug 20 '18 at 01:46
  • @KamilCuk also I just found out `readelf -W -p .rodata` doesn't output the offset for null strings (such as "\0\0" and "\0"). Hence, I can't use readelf. Let me update my question. Thanks for your suggestion tho! – HCSF Aug 20 '18 at 07:39
  • All the .rodata sections are placed in the one .rodata section – 0___________ Aug 20 '18 at 13:13
  • @P__J__ ok, so what's `.rodata1`? – HCSF Aug 20 '18 at 14:42
  • look at your linker script and you will know – 0___________ Aug 20 '18 at 15:19
  • Would it be no issue to you if `cde` is found although it doesn't stem from a string literal on its own, and if `ab` is found instead of `ab\0cde`? – Armali Aug 21 '18 at 06:56
  • @Armali my end group is like if someone gives me the offset/index/virtual memory address of a string literal in an executable, I would like to print it out. And printing `ab\0cde\0` via printf() will become `ab\0` anyway; hence, `cde\0` as a separated artificial entry is okay. Extra artificial strings are okay. Missing isn't. Thanks for your help! – HCSF Aug 21 '18 at 07:20
  • @Armali I updated my question to reflect your question. Thanks! – HCSF Aug 21 '18 at 07:23
  • Hmm - isn't there a contradiction between the requirement to ignore non-matching addresses and the statement "_Extra artificial strings are okay_"? I mean, if extra strings are okay, how decide what is not okay - or put the other way round, why bother to decide at all, why not simply print any (maybe printable) string at the given address? – Armali Aug 21 '18 at 10:00
  • @Armali there is no contradiction but probably some confusion -- I have some way to guarantee that the caller to my function will only pass in valid offset/index/virtual memory address. Hence, my function will never ever hit the extra artificial strings. Hence, those extra artificial strings sitting in the map are absolutely okay. – HCSF Aug 21 '18 at 13:09
  • Your last question is very inspiring. Actually, given that the caller will only pass in valid offset/index/virtual memory address, I probably can use the offset to look up the "printable string" as you suggested. I can play with it a bit more. But of course, if the ELF sections contain all the valid offsets, why not using them? Then, I can loose the requirement that the caller has to pass in only valid offsets. Hope it is not unreasonable. – HCSF Aug 21 '18 at 13:13
  • Well, _if the ELF sections contain all the valid offsets_… - the problem here is that those _valid offsets_ are not collected anywhere where they can conveniently be accessed, rather they are dispersed within machine code instructions in the `.text` segment. Some (global) strings have entries in the symbol table (if not stripped), but many strings have only local labels and their address offsets simply are not recorded in any table. Did `readelf -p …` lead you to think such a table existed? That's not the case, `readelf -p …` just goes thru the section and prints all that looks like a string. – Armali Aug 21 '18 at 13:30
  • 1
    Yes, indeed `readelf -p`'s output makes me think the offsets are stored in elf. And [readelf source code](https://github.com/bminor/binutils-gdb/blob/master/binutils/readelf.c) is quite complicated (see `dump_section_as_strings()`), which makes me think I couldn't just parse the string table by using `\0` as the delimiter. – HCSF Aug 21 '18 at 13:51
  • 1
    @Armali would you mind updating your answer with "ELF sections don't contain offsets/indexes to the string literals"? Then, I will accept your answer. Thanks. – HCSF Aug 21 '18 at 14:16
  • I updated my answer. And indeed `readelf`'s `dump_section_as_strings()` is quite complicated, but most of the complication comes from the decompression (option `-z`); the actual string search is in the [`while` loop at the end of the function](https://github.com/bminor/binutils-gdb/blob/master/binutils/readelf.c#L13439), which just scans for printable characters. – Armali Aug 22 '18 at 07:40

2 Answers2

1

I am trying to get offsets, strings and virtual addresses in .rodata and .rodata1 sections.

I would like to write a C or C++ code to retrieve:

  1. The offsets of all the string literals (e.g. 10, 14, 23 above and the missing one for "\0\0")

  2. The string literals (e.g. "%p\n", "hello wolrd", "\0\0" above)

A string literal is a sequence of characters enclosed in double-quotes. We practically cannot tell what in an ELF data section is a representation of a string literal. Consider these lines added to your main():

        static const int s = '\0fg\0';
        myprintf((char *)&s);

Although there is no string literal, readelf -p .rodata … may output a line like e. g.

  [    21]  gf

So, to truly recognize representations of string literals in a data section, it would be necessary to correlate the data with source code tokens (difficult) or assembler code (perhaps easier).

it would be an issue to me that if a string literal doesn't exist in .rodata

This can easily happen. Consider:

        static char hello[] = "Hi";
        myprintf(hello);

Since the string literal is used to initialize a character array, which has to be modifiable, it can go into the .data instead of the .rodata section, as readelf -p .data … may show.

if the ELF sections contain all the valid offsets, why not using them?

The valid offsets are not collected anywhere where they can conveniently be accessed, so for practical purposes we can say ELF sections don't contain offsets/indexes to the string literals.


I am able to access the entire string table in .rodata but not "string table indexes". The article mentions "string table indexes" but it doesn't specify how to retrieve indexes.

The string table indexes are not mentioned in connection with .rodata, but with the string table section .strtab:

This section holds strings, most commonly the strings that represent the names associated with symbol table entries.

Armali
  • 18,255
  • 14
  • 57
  • 171
  • Thanks for your reply. I actually don't mind seeing non string literal in `.rodata` section. Tho, it would be an issue to me that if a string literal doesn't exist in `.rodata` (assuming linking to static libraries only). So you are saying that `.strtab` section has the string table indexes for `.rodata`? But based on what you quoted, it sounds like `.strtab` only holds string but not the offset/index. Maybe I misunderstand. Do you mind elaborating? Thanks! – HCSF Aug 20 '18 at 14:41
  • 1
    No, I'm not _saying that `.strtab` section has the string table indexes for `.rodata`_, but rather that `.strtab` holds _symbol table_ names and so those _string table indexes_ are irrelevant for your goal. – Armali Aug 21 '18 at 06:08
  • Concerning the mentioned _issue_, I'll amend the answer. – Armali Aug 21 '18 at 06:27
  • Your counter example (static array) is excellent. Thanks for pointing that out. Then, now, I have to think of how to handle it as well. Thanks! – HCSF Aug 21 '18 at 14:22
  • Thanks for updating. Just a side but related question, do you know what the first 16 bytes are in `.rodata`? I noticed that it has 1 0x1 and 1 0x2 and then the rest is 0x0. Thanks in advance. – HCSF Aug 22 '18 at 07:57
1

Just a side but related question, do you know what the first 16 bytes are in .rodata? I noticed that it has 1 0x1 and 1 0x2 and then the rest is 0x0.

This is not always so; it simply depends on what read-only data the program uses. For example, if I compile your example program, the string %p\n starts at offset 4, and preceding that I also have 1 and 2 (as 16-bit words), but no zeros. Looking further what symbol might be at the start of .rodata with

> readelf -s … | grep 400738
    14: 0000000000400738     0 SECTION LOCAL  DEFAULT   14
    59: 0000000000400738     4 OBJECT  GLOBAL DEFAULT   14 _IO_stdin_used

(400738 being the .rodata start address here), I get _IO_stdin_used, a global object of size 4, which sounds like something from the standard library.

Armali
  • 18,255
  • 14
  • 57
  • 171