1

Two days ago I started learning assembly and I could not find these questions on the internet, I would be glad if you could help. I learned that the starting point of the program must be specified as global _start. I have two questions. First of all, in all the codes I have seen, the global _start part was written inside the text section part. Is it possible to write the global _start part outside the text section? My second question is, can the _start part in the global _start be changed? So if I type global _asd or global qwe for defining the starting point of the program, will I get a syntax error?

Note: I'm currently on a Linux Ubuntu. I'm using nasm tool as assembler and ld as linker.

Jester
  • 56,577
  • 4
  • 81
  • 125
Piamoon
  • 95
  • 1
  • 7
  • 2
    1. It must be in an executable section. Depending on your linker and OS this may have to be `.text` or may be allowed to have a different name. If the code is not in a section which is certainly executable, it can end up being non-executable (NX bit set) so can cause a crash. 2. Usually you can use an option to the linker to change the label name used for the program entrypoint. If, for a normal application, you fail to link anything that has a `_start` label and do not specify a different name to the linker then the linking will fail with an error. That isn't a "syntax error" though. – ecm Sep 19 '20 at 10:50
  • 3
    You can put the `global` anywhere you like since that just marks the symbol global. You should put the symbol itself into an executable section as outlined above by @ecm. PS: please specify and tag with the assembler you use. – Jester Sep 19 '20 at 10:53
  • 2
    re: code outside `.text` - [segmentation fault with .text .data and main (main in .data section)](https://stackoverflow.com/q/34350582) - especially [Why data and stack segments are executable?](https://stackoverflow.com/q/7863200) - depending how you link assemble and link, you can end up with `.data` having RWX permissions. – Peter Cordes Sep 19 '20 at 12:41
  • 2
    You *can* specify any name you want for the process entry point when linking, with `ld -e my_entry_point_name` or whatever. This is usually more confusing than useful for people reading your source code; there are very few reasons to specify any symbol name other than `_start`. – Peter Cordes Sep 19 '20 at 12:42

2 Answers2

1

This is a gnu ld question not nasm. When ld links it is looking for that symbol to mark as the entry point. Your question is vague as to the target, but stating nasm indicates x86 and of course Linux is not vague.

So since you are loading the program being built from an operating system like Linux the entry point is critical, unless of course you manipulate the binary in some way or indicate to the linker in some way what your entry point is. Your program will not operate properly and quite likely simply crash, if the program is not executed in the proper order, you cant just jump into the middle of a program and hope for success, much less try to execute beginning with .data or something not code.

Now as mentioned in comments (up vote the comments please) you can change the entry point label if you don't want to use the _start label. If you do not specify _start, ld will give a warning and continue, but if you don't give it another label then you are at risk of it entering in the wrong place.

If this were bare-metal for a microcontroller for example then you don't have an operating system loading the program into memory and entering anywhere in the binary that you specify, you are instead governed by the hardware/logic and have to conform to its rules and craft the code, linker script, command line, etc to generate the binary to match the logic specified entry point, and in that case you can go without the _start all together, take whatever default ld puts in its output binary which is then at some point used to program the flash/rom in the mcu (stripping all of that knowledge from the binary file including the entry point).

I am not so sure about nasm, but assume you are always in some section, so the label will land somewhere. If it is not in a .text section and you are using it as the entry point (by default, by not specifying something else). Even if it is the last line before a .text section declaration, the linker is going to put that label with the other labels in the section it lands, so because it is in the file just before a .text declaration rather than just after let's say, it may land with an address that is nowhere near the code that follows in the source file.

Some examples, using gnu tools, the question is ld specific so the target and assembler don't necessarily matter here.

MEMORY
{
    one   : ORIGIN = 0x1000, LENGTH = 0x1000
    two   : ORIGIN = 0x2000, LENGTH = 0x1000
    three : ORIGIN = 0x3000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > one
    .data   : { *(.data*)   } > two
    .bss    : { *(.bss*)    } > three
}

.globl _start
_start:
    nop

Building and use readelf

  Entry point address:               0x1000

Now if I

.globl here
here:
    nop

.globl _start
_start:
    nop

.globl there
there:
    nop


00001000 <here>:
    1000:   e1a00000    nop         ; (mov r0, r0)

00001004 <_start>:
    1004:   e1a00000    nop         ; (mov r0, r0)

00001008 <there>:
    1008:   e1a00000    nop         ; (mov r0, r0)

  Entry point address:               0x1000

And that may be confusing... but let's move on.

arm-linux-gnueabi-ld -nostdlib -nostartfiles -e _start -T so.ld so.o -o so.elf

  Entry point address:               0x1004

Or instead

ENTRY(_start)
MEMORY
{
    one   : ORIGIN = 0x1000, LENGTH = 0x1000
...


  Entry point address:               0x1004

But I can also do this:

    .globl here
    here:
        nop
    
        nop
    
    .globl there
    there:
        nop

ENTRY(there)
MEMORY
{
    one   : ORIGIN = 0x1000, LENGTH = 0x1000

  Entry point address:               0x1008

Noting that the linker didn't warn about _start

If I now remove ENTRY() from the linker script.

  Entry point address:               0x1000

But if I do this:

arm-none-eabi-ld so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000

Which means no linker script so it is going to use defaults, then it is looking for it. Which we can do ourselves with

ENTRY(_start)
MEMORY
{

but no defined _start global label

arm-linux-gnueabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000

So if you are simply doing

nasm stuff myprog.asm stuff myprog.o
ld myprog.o -o myprog

You are using whatever default linker settings/script for the tool/environment and it likely has an ENTRY(_start) or equivalent as the default. If you are in complete control of the linker and you want to load a program into Linux then you need a safe/sane entry point for the program to work otherwise ld defaults to the beginning of the binary or beginning of .text which we can test:

SECTIONS
{
    .text   : { *(.text*)   } > two
    .data   : { *(.data*)   } > one
    .bss    : { *(.bss*)    } > three
}

.globl here
here:
    nop

.data
.word 0x12345678

arm-linux-gnueabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000002000


Disassembly of section .text:

00002000 <here>:
    2000:   e1a00000    nop         ; (mov r0, r0)

Disassembly of section .data:

00001000 <.data>:
    1000:   12345678

so beginning of .text not beginning or first address space in the binary

ENTRY(somedata)
MEMORY
{
    one   : ORIGIN = 0x1000, LENGTH = 0x1000
    two   : ORIGIN = 0x2000, LENGTH = 0x1000
    three : ORIGIN = 0x3000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > two
    .data   : { *(.data*)   } > one
    .bss    : { *(.bss*)    } > three
}


.globl here
here:
    nop

.data
.globl somedata
somedata: .word 0x12345678

  Entry point address:               0x1000

This is as trivial to do with nasm and ld as demonstrated above with gas and ld. This shows that _start isn't actually magic any more than main() is with respect to ld (or even gcc). _start seems/feels magic because default linker scripts call it out, so folks think it is magic. main() is magic because the language defines it as such but in reality it is the bootstrap that makes it so and if you simply

gcc helloworld.c -o helloworld

You are getting default bootstrap and linker script. But you could make your own bootstrap or modify the one in your C library and use it and not have a main() in your program and the tools don't care it will just work fine. (not all tools of course as some tools do detect main() and add critical stuff that might not normally get added, especially for C++). But, the gnu tools are particularly flexible and generic which makes them usable for so many targets, bare-metal to kernel drivers to operating system applications.

Use the tools you have, they are very powerful, do experiments like the above first.

halfer
  • 19,824
  • 17
  • 99
  • 186
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 1
    Does ld have a command line parameter to define an entry point name other than _start? – rcgldr Sep 19 '20 at 15:16
  • 2
    @rcgldr: Yes, `ld -e entry_point_name`, which this answer even uses at one point. No need for linker scripts for the basic examples, and it's probably confusing for beginners to have them in the same code block as different files (`.s` asm sources). – Peter Cordes Sep 19 '20 at 15:47
  • 2
    "I am not so sure about nasm, but assume you are always in some section, so the label will land somewhere." The [manual does state](https://www.nasm.us/doc/nasmdoc8.html#section-8.1.3) that "Any code which comes before an explicit SECTION directive is directed by default into the .text section." for the multi-section `bin` output format. I assume the same is true for `elf`. – ecm Sep 19 '20 at 20:13
  • @ecm was just being generic, if a tool doesnt allow that it should complain and as demonstrated above gnu assembler defaults to .text as well. If you guys dont like the answer delete it, edit it, make your own, etc. – old_timer Sep 20 '20 at 00:01
  • When you have built an asm file that lays out the ELF header the magic goes away. – Joshua Sep 24 '20 at 21:36
0

I learned that the starting point of the program must be specified as global _start

No, that's wrong! we can set any name for starting point instead of _start

Is it possible to write the global _start part outside the text section?

Yes!

can the _start part in the global _start be changed? So if I type global _asd or global qwe for defining the starting point of the program, will I get a syntax error?

Yes it can be changed, You will not get any error but need to specify the name of starting point from the CLI while linking.

ld -e starting_point_name app.o -o app

Naveed Hematmal
  • 343
  • 4
  • 17
  • 1
    `global _start` can go anywhere in the source file, before or after the label, and that directive doesn't care what the current section is. The `_start:` label itself should be in `section .text`, where you should also put your code. You could use a custom name for an executable section, or link in a way that makes everything executable including `.data` if you put code there, for example. See my comments under the question for links with examples of doing that. – Peter Cordes May 19 '23 at 20:53
  • @PeterCordes thanks for that. I have changed my answer – Naveed Hematmal May 19 '23 at 21:05
  • I think your answer would be better if you made it clear that `_start:` and `global _start` are two separate things; the phrasing in the question seems isn't clear about that, and future readers might not realize that directives like `global` that modify symbols don't have to be next to the label that defines the symbol. The `_start:` label itself should be in `.text` unless you're doing other tricky stuff on purpose, but putting `global` directives at the file is not rare, to maintain the list of exported symbols in one place. – Peter Cordes May 19 '23 at 21:10