5

I am learning assembly and low-level programming itself and reading a book about it. It is said there that we can put any data inside the .text section of an elf file but of course we can't mutate it because of different permissions of pages/segments. But it was not told there, what was the reason for it, for having data inside .text section. I was also told by many C++ programmers that g++ compiler puts

static const char DATA[] = "SOME DATA";

inside the .text section too. I wonder, why not to put this data inside .rodata section, what is the purpose? And if .text is used, what to store in the .rodata then?

The main question is about this behaviour in long mode.

VP.
  • 15,509
  • 17
  • 91
  • 161
  • 1
    I tried it and g++ put `DATA` in `.rodata`. – melpomene Jun 26 '18 at 12:04
  • 1
    [But it is stored in `.rodata`](https://godbolt.org/g/eVBSSK). – Hatted Rooster Jun 26 '18 at 12:04
  • 1
    "*I was also told by many C++ programmers*" [citation needed] – melpomene Jun 26 '18 at 12:10
  • @melpomene It were just colleagues at my work, unfortunately. While I see (thanks to you) that it is not in `.text`, the answer by @fuz answers my question anyway. I think, I should delete c++ code and c++ tag then, what do you think? – VP. Jun 26 '18 at 12:15
  • 2
    @VictorPolevoy No, I think your tags are fine. – melpomene Jun 26 '18 at 12:16
  • 1
    Also, if you're writing for a system where code can be executed directly from ROM chips, like in some MCUs, or older (read retro) systems, you won't need to copy the data to RAM to use it. – Thomas Jager Jun 26 '18 at 12:36
  • @SombreroChicken is right: static read-only data (like string literals) goes in the `.rodata` *section* on Linux, or `.rdata` on Windows. That section gets linked into the TEXT *segment* of the executable. [What's the difference of section and segment in ELF file format](https://stackoverflow.com/q/14361248) – Peter Cordes Jun 26 '18 at 20:03

2 Answers2

13

Traditionally, read-only data was placed in the text section for two reasons:

  • the text section is not writable, so memory protection can catch accidental writes to read-only data and make your program crash instead
  • with a memory-management unit (MMU), multiple instances of the same process can share one copy of the text section (as its guaranteed to be the same in all instances of the program), saving memory

On ELF targets, this scheme was modified a bit. Read-only data is now placed in the new .rodata section which is like the .text section except it also cannot be executed, preventing certain attack vectors. The advantages remain.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • 1
    Also when building binaries for persistent memory chips (ROM/...), the ".data" are usually in volatile DRAM, which is damaged when power is lost, and often on embedded systems both ".text" and ".rodata" are effectively the same section. On some platforms also constants are interleaved between code directly to allow for simple relative addressing against instruction pointer, having them in ".rodata" could introduce extra pointer if it would be not at fixed relative offset from ".text", and some platforms like short offsets for encoding. (plus cache locality may boost performance). – Ped7g Jun 26 '18 at 12:53
  • I.e. the question ".rodata" vs ".text" is quite subtle, and mostly "because it has some minor advantages on modern platforms in terms of protection", but they are very similar... if the question would be "why read-only, why not .data and just initialize them", it would be much simpler and less subtle to answer that... :) – Ped7g Jun 26 '18 at 12:57
  • The `.rodata` section is part of the text segment, and is part of the same mapping at run-time. (So it is executable.) That's why [putting machine code in a string literal](https://codegolf.stackexchange.com/a/114619/30206) like `L"\xf33f048d\xc3c0520f"` still works without `-zexecstack`. Hmm, I checked with `readelf -a /bin/ls`, though, and the section didn't have the `X` flag. But in practice it ends up in the same mapping with executable code. – Peter Cordes Jun 26 '18 at 20:08
  • @PeterCordes Sections and program headers do not need to agree in anything. In fact, the could reference totally separate parts of the binary. Still odd that the linker script throws the two together. Perhaps sone legacy software assumes rodata is executable? – fuz Jun 26 '18 at 21:01
  • @fuz: Or else it's for efficiency: one larger mapping instead of two separate ones is faster for the kernel to create, and consumes one fewer entry in the list of mappings. And you don't have to pad to a page boundary. (Actually I guess you'd just have `.rodata` start in the middle of a page so the mapping could align with 4k offsets into the executable file. It's fine to have the same page mapped twice or 3 times; as rodata, text, and data.) – Peter Cordes Jun 26 '18 at 21:12
  • 2
    There does seem to be a `r--` mapping as well as the `r-x` (text) and `rw-` (data) mappings in compiler output from throwing that code-golf hack into a file. But the string literal is in the same mapping as `main`, so it is executable. (I set a breakpoint and single-stepped). Oh, I think that's something else; the first 4 bytes of the page are `127 '\177' 69 'E' 76 'L' 70 'F'`, so it's probably some metadata. IDK if they could have put .rodata into this segment. – Peter Cordes Jun 26 '18 at 21:24
  • @PeterCordes That's the ELF header. Doing it this way would mean for the rodata section to appear first in the file (immediately after the ELF header) and thus either rodata would preceed text or the mapping would not be in the order in which the segments appear in the binary. Both would be a stark departure from historical practice and maybe break some programs. – fuz Jun 26 '18 at 21:54
  • Oh right. It's a small file, and both the `r--` and `r-x` mappings start at offset 0 into the file. So they fully overlap, and I don't know which addresses in the `r--` mapping actually get referenced. – Peter Cordes Jun 26 '18 at 21:56
  • 2
    Update on this: a recent version of `ld` changed to linking `.rodata` into its own non-executable ELF segment, so `const char code[] = { 0xc3 };` no longer works when cast to a function pointer, without `-zexecstack`. It did *used* to "just work" to put machine code in a const array or string literal. – Peter Cordes May 22 '19 at 13:36
  • 1
    Update 2: Recent Linux kernels (5.5 or so) [changed the meaning of `-z execstack`](https://stackoverflow.com/questions/64833715/linux-default-behavior-of-executable-data-section-changed-between-5-4-and-5-9) to actually make only the stack executable, not READ_IMPLIES_EXEC. Fixing most of [Unexpected exec permission from mmap when assembly files included in the project](https://stackoverflow.com/q/58260465). See [How to get c code to execute hex machine code?](https://stackoverflow.com/q/9960721) (including my answer for stuff like `__attribute__((section(".text")))`) – Peter Cordes May 24 '21 at 19:07
3

A lot of correct things were said here. I will make some additions and clarificatons.

  • The fact that we can put constant data in .text does not mean that we should. After all, instructions and data are just binary numbers.
  • It also does not mean that the modern compilers are (always) doing it.
  • The .rodata, .text and other sections are largely an implementation detail.
  • It is true, that the big chunks of const data are often stored in .rodata. However, in your case, a const static string, which is sufficiently small, may just get inlined into the instruction stream when used. The string itself, which is ought to be placed in .rodata, may then be optimized out, but its contents, being split over some instructions, will be de facto stored in .text.
Igor Zhirkov
  • 303
  • 2
  • 8
  • 1
    Fun fact: ARM traditionally puts small constant data between functions ("literal pools") so they're reachable with PC-relative load instructions. But compilers usually just put a pointer to the real static data if it's bigger than a register, not a whole string. And yeah, [compilers don't mix code and data at all on x86, although obfuscators might](https://stackoverflow.com/questions/55607052/why-do-compilers-put-data-inside-textcode-section-of-the-pe-and-elf-files-and). – Peter Cordes May 24 '21 at 19:11