7

I'm trying to learn ARM assembly.

After writing this small Hello World program:

                .global _start

                .text
_start:         ldr     R1,=msgtxt      
                mov     R2,#13          
                mov     R0,#1           
                mov     R7,#4           
                svc     0               

                mov     R7,#1           
                svc     0               


                .data
msgtxt:         .ascii  "Hello World!\n"

                .end

I noticed I could remove the .text and .data directive, the program would work just as well.

I'm therefore curious : everything I read emphasized the fact that .text section is to be used for code and .data for data. But here, before my eyes, they seem to do nothing at all!

Therefore, if these are not used to hold code and data respectively, what is their true purpose?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ykon O'Clast
  • 195
  • 1
  • 10

2 Answers2

7

Those sorts of directives depend on what architecture you're building your program for, and they choose what memory section to assign to whatever code or data that follows. In the end, everything is just a string of bytes. After your program is assembled, the symbols/labels will be assigned different memory addresses according to what section they're in.

.text is generally allocated in a read-only memory section, most-suitable for code that isn't expected to change.

.data is typically a writable section of memory. I believe that it's quite common to put your string in .text right next to your code data if it isn't expected to change (or maybe the architecture has a similar read-only segment). I would say that the .data section is even avoided most of the time. Why? Because the .data section needs to be initialized—copied from the program binary into memory when the program starts. Most data that your program references can be read-only, and any memory that they need for operations is usually just allocated with the .bss segment, which allocates a section of uninitialized memory.

There are some advantages of mixing code and data in the same section, such as easy access to the address of the data with a relative offset from the PC register (address of the code being executed). Then of course there are the disadvantages, in that if you try to modify read-only memory, you'll end up with at the very least your actions ignored, and the program might trigger an exception and crash. All very architecture-specific, and the safest bet is to keep code in segments meant for code, and data/allocations in segments meant for data.

It's all very specific to what your program is targeting. For example, the Game Boy Advance had a 256KB "slow" memory region, a 32KB "fast" memory region, and then the read-only "ROM" region (the game cartridge data) which can be several megabytes, and assemblers used these memory sections:

.data or .iwram  -> Internal RAM (32KB)
.bss             -> Internal RAM uninitialized
.ewram           -> External RAM (256KB)
.sbss            -> External RAM uninitialized
.text or .rodata -> Read only ROM (cartridge size)

To give another example, the SPC-700 (SNES sound chip) had 64KB of readable and writable memory that was used for everything, but the first 256 bytes of it had faster access (the "zero page"). In this theoretical case, .data and .text would be assigned to the same memory region--that is, they would not be allocated in the zero-page, and they both share the same memory. There would be a custom segment for the zero-page, and the difference between .text and .data would be very little - just a way to distinguish which symbols in the assembled program point to "data" and which symbols point to program code.

mukunda
  • 2,908
  • 15
  • 21
  • 1
    Dude, your post put me in the way back machine. :) It should be made clear that modifying ROM memory will crash a program, so the directives matter quite a bit. Also, for speed reasons as you mention, the mattered quite a bit as well. On modern OS's though where execution rights are carefully controlled, putting code in a data section will cause a security fault. – Michael Dorgan Mar 11 '19 at 18:55
  • I was quite the Nintendo homebrew enthusiast. I miss all this stuff. :( – mukunda Mar 11 '19 at 19:00
  • That is very interesting, thank you. Could you tell me what is, then, the "default" behaviour if no directive is used? Is all the code put in writable memory? – Ykon O'Clast Mar 11 '19 at 19:49
  • 1
    That's not really something that's clearly defined and would be dependent on your assembler. I would guess that it usually defaults to the `.text` section, since typically you are assembling code and not data. – mukunda Mar 11 '19 at 19:56
  • @mukunda - I'm still an enthusiast. I actually lead the Nintendo compiler team at the moment :) – Michael Dorgan Mar 11 '19 at 21:29
  • @MichaelDorgan That's awesome. I'm not too into console things lately, but Nintendo sure had some fun hardware to play with. – mukunda Mar 11 '19 at 21:36
  • @mukunda: yes, most assemblers default to `.text` or `.code` at the top of the file. – Peter Cordes Mar 12 '19 at 01:11
  • That was also a great answer, it was hard to choose. – Ykon O'Clast Mar 13 '19 at 08:41
  • I agree, especially the point about separate cache lines for instructions and data on modern systems. Not only is that really relevant to the question, but it's also just good advice in general--cache misses are a very high impact against performance and minimizing that by keeping relevant data together is key for better performance (so long as the targeted architecture agrees). – mukunda Mar 13 '19 at 19:31
3

GAS (like most assemblers) defaults to the .text section, and your read-only data still works in .text

Everything is just bytes


You can do echo 'mov r1, #2' > foo.s and assemble+link that into an ARM binary (with
gcc -nostdlib -static foo.s for example). You can single-step that instruction in GDB.

(Without a sys_exit system call your program will crash after that, but of course you could do that too still without any directives.)

The linker will warn that it didn't find a _start symbol (because you left out the label itself, not to mention the .globl directive that told the assembler to make it visible in the object file's symbol table.

But GNU binutils ld's default is to use the start of the .text section as the ELF entry point.

Most sections other than .text aren't linked into executable memory by default, so having _start: in .data would normally be a problem.


Read-only data should normally go in the .rodata section, which is linked as part of the TEXT segment anyway. So as far as runtime behaviour is concerned, placing it at the end of the .text section (by leaving out .data) is pretty much exactly equivalent to what you should have done.

What's the difference of section and segment in ELF file format

Putting it in .data leads to the linker putting it in a different segment that tells the OS's ELF program loader to map it read+write (and not execute).

The point of having a .rodata section separate from .text is to group code together and data together. Many CPUs have split L1d and L1i caches, and/or separate TLBs for data / instructions, so fine-grained mixing of read-only data with code wastes space in split caches.

In your case, you're not linking any other file that also have some code and some data, so there's no difference.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thank your for these precisions, but could you elaborate on the "Many CPUs have split L1d and L1i caches, and/or separate TLBs for data / instructions, so fine-grained mixing of read-only data with code wastes space in split caches." That was not entriely clear to me (though it seems interesting). – Ykon O'Clast Mar 12 '19 at 13:40
  • @YkonO'Clast: If you have a mix of both code and data in two 64-byte chunks of RAM, they will both have to get loaded into both L1i and L1d cache. But if one of them is pure code, the other is pure data, then the data-only cache line will only be in L1d, and the code-only line will be in L1i. So split caches make fine-grained mixing a waste of space. – Peter Cordes Mar 12 '19 at 14:01