3

Address of string literals are determined at compile time. This address and the string literal can be found in the built executable program (In ELF format). For example, the following code outputs String Literal: 0x400674

printf("String Literal: %p\n", "Hello World");   

And objdump -s -j .rodata test1 shows

Contents of section .rodata:

400670 01000200 48656c6c 6f20576f 726c6400 ....Hello World.

....

So it looks like I can get the virtual address of "Hello World" by reading the executable program itself.

Question: How can I build a table/map/dictionary between the address of string literal and the the string itself, by reading the ELF format?

I am trying to writeup a standalone python script or c++ program to read the elf program and generate the table. It's OK if extra mapping(not the string literal) in the table, as long as the table contains the whole mapping of string literals.

Peng Zhang
  • 3,475
  • 4
  • 33
  • 41
  • It's ELF not EFL. You find yourself an ELF reading library and use it, or find yourself ELF format specs for the versions you're interested in and write your own. Library recommendations are off topic here. – Tony Delroy Feb 20 '15 at 05:44
  • Why do you ask? Do you know the `strings` command? – Basile Starynkevitch Feb 20 '15 at 06:20
  • Please edit your question to improve it and give some motivation... – Basile Starynkevitch Feb 20 '15 at 06:32
  • @BasileStarynkevitch The motivation is passing string literal to another process with minimal overhead. Since we can get string literal from exe file, there is no need to copy the string to a memory so that another process could use. – Peng Zhang Feb 20 '15 at 06:37
  • 1
    Please edit your question. And you are absolutely wrong: two different processes have different address spaces in virtual memory, by definition! – Basile Starynkevitch Feb 20 '15 at 06:38
  • @BasileStarynkevitch Another program/process Y reads the exe file X, create a map where key is the virtual address of string literal in X, and value is the string literal. Y is not accessing the address in X, of course. – Peng Zhang Feb 20 '15 at 06:41
  • You should improve your question and give a use case. I believe that you are misunderstanding many things... – Basile Starynkevitch Feb 20 '15 at 06:53
  • @BasileStarynkevitch I will look up more detail in ELF format and then post another question to describe what I am fiddling with. I am just curious, if we could know the string content from the exe file, why do we need to pass the whole string to another process in run time. We could just pass an integer, saving a lot especially when the string is long. Just think this as a very interesting question. However, seems like people are not doing this way. – Peng Zhang Feb 20 '15 at 06:59

1 Answers1

3

I am not sure your question always make sense. Details are implementation specific (operating system and compiler and compilation flags specific).

First, a compiler which sees both "abcd" and "cd" literal strings in the same translation unit is permitted (but not required) to share their storage and use "abcd"+2 as the second one. See this answer.

Then, in ELF files, strings are simply initialized read-only data (often in the .rodata or .text section of the text segment), and they could happen to be the same as some non-string constants. ELF files do not keep any typing information (except as debug DWARF information when compiled with -g). In other words, the following

const uint8_t constable[] = { 0x65, 0x68, 0x6c, 0x6c, 0x6f, 0 };

has exactly the same machine representation as "hello" literal string, but is not a source string. Even worse, some parts of the machine code could happen to look like strings.

BTW, you could use the strings(1) command, or perhaps study its source code and adapt it for your needs.

See also dladdr(3) and this question.

Bear in mind that two different processes have (by definition!) different address spaces in virtual memory. Read also about ASLR. Also string literals may occur in shared objects (e.g. shared libraries like libc.so) which are often mmap-ed in different address segments (so the same literal string would have different addresses in different processes!).

You might be interested by libelf or readelf(1) or bfd to read the ELF file.

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Thank you. I do use strings, but it seems only output strings, without the addresses. Do you have reference describing the "First" paragraph in your answer. If the compiler could reuse storage, the mapping is not that interesting for my use case. As for your comment on the non-string constants, that's OK as said in my question. As long as the table contains the mapping of the string literal address, the table is good for me. – Peng Zhang Feb 20 '15 at 06:31
  • Thank you very much. I just confirmed the overlapping of string literals in .rodata! I have a test program prints out the address of "Hello", "ello", "llo", "lo", "o". With `g++ -Os`, those addresses are just one byte away. And the .rodata shows there is only one string literal in the ELF rodata section. – Peng Zhang Feb 20 '15 at 07:08