Why are labels not printed while re-writing assembly with just bytes, and how can a program always start at the same memory location?

Question

I have the following assembly code linked to final executable.

section .text
global _start

_start: mov eax, 4
        mov ebx, 1
        mov ecx, mesg
        mov edx, 9
        int 0x80
mesg    db      "Kingkong",0xa

The next thing I did was get its hexcode 0xb8,0x04,0x00,0x00,0x00,0xbb,0x01,0x00,0x00,0x00,0xb9,0x76,0x80,0x04,0x08,0xba,0x09,0x00,0x00,0x00,0xcd,0x80,0x4b,0x69,0x6e,0x67,0x6b,0x6f,0x6e,0x67,0x0a

and place it into another program which looks like the one below

section .text
global _start
_start:
        db 0xb8,0x04,0x00,0x00,0x00,0xbb,0x01,0x00,0x00,0x00,0xb9,0x76,0x80,0x04,0x08,0xba,0x09,0x00,0x00,0x00,0xcd,0x80,0x4b,0x69,0x6e,0x67,0x6b,0x6f,0x6e,0x67,0x0a

Now when I assemble the above file and get an objdump over it, it gives me

08048060 <_start>:
 8048060:       b8 04 00 00 00          mov    $0x4,%eax
 8048065:       bb 01 00 00 00          mov    $0x1,%ebx
 804806a:       b9 76 80 04 08          mov    $0x8048076,%ecx
 804806f:       ba 09 00 00 00          mov    $0x9,%edx
 8048074:       cd 80                   int    $0x80
 8048076:       4b                      dec    %ebx
 8048077:       69 6e 67 6b 6f 6e 67    imul   $0x676e6f6b,0x67(%esi),%ebp
 804807e:       0a                      .byte 0xa

The mesg label is not seen in the final dump, how does the program then figure out the address of the mesg segment in the above program?

EDIT: Well I would like to add a small question to this after reading the answers, I can understand that labels are not used for the actual addressing but the address is directly baked into the code, But if address are specified like mov $0x8048076,%ecx what is the guarantee that the next time the program loads it will start exactly at that same address ... What if I wrap this code with a C ? What if I want to run it on another machine with a completely different memory pattern ?

That last block is a "disassembly" -- the hex codes have been translated into the corresponding mnemonics. But, unless there is some debug info stashed somewhere, the labels were converted into code offsets, and there is no way to convert back. — Hot Licks, Aug 06 '13 at 17:43
the labels are only there for the humans to read, the processor doesnt need/use them. — old_timer, Aug 06 '13 at 17:59
Like this `char* a = "my hex code"` and inside the main function call it like a function (Shellcodes I mean). Have a look at sp's answer in this page http://stackoverflow.com/questions/15593214/linux-shellcode-hello-world — vikkyhacks, Aug 08 '13 at 06:26

Vivin Paliath · Accepted Answer · 2013-08-07T15:47:32.537

Labels are translated to offsets/addresses. You won't see the actual label unless you explicitly preserve that information for debugging.

The line:

mov    $0x8048076, %ecx

basically has the value of mesg, which is the address 0x8048076, which is also the start of your string Kingkong.

The program doesn't need to "figure out" what the value of mesg is because it doesn't even know that there is something called mesg. All it sees is an address, which is fine, because that's all it needs.

Using named labels is just convenient and helps with readability. They only really matter to the assembler and linker in the sense that they will convert the value of the label into its actual address or offset. It can also be used by the debugger (if you instruct the assembler or linker to preserve debugging information) to help you debug your code.

To address your second question:

The addresses that you have are virtual memory addresses (i.e., they are not physical memory addresses). All this means is that your executable doesn't really need to know what physical address it will be at, since the OS will map it to the correct location (i.e., in physical memory) at runtime. This is why your executable will work if you run it on another machine (assuming the executable has been compiled for that OS) or if you run it repeatedly. The OS takes care of mapping that virtual address to physical memory.

You can take a look here and here for more information.

well if it is that way then how is the operating system always able to allow the program start at the same memory location. Shouldn't there be some form of relative address scheme, !!! I have edited the question, !!! its more clear there ... thanks for the answer — vikkyhacks, Aug 07 '13 at 15:02

score 2 · Answer 2 · answered Aug 06 '13 at 17:46

2

The mesg label is not seen in the final dump, how does the program then figure out the address of the mesg segment in the above program?

Labels are only meaningful to the assembler and linker (and debugger). They will be replaced by their assigned addresses in the final machine code (which can be subject to change at runtime if the executable needs to be relocated).

As you can see in the disassembly, ecx is loaded with the address 0x8048076. At that address in the disassembly we find the bytes 4b 69 6e 67 ..., which corresponds to the characters 'K', 'i', 'n', 'g'. In other words; ecx is now pointing to the beginning of your mesg string.

answered Aug 06 '13 at 17:46

Michael

57,169
9
80
125

is it is that way then If I run my program on another computer then how will it work, wont that print segfault, cos there is no absolute certainty that program will always be able to access `0x8048076`, if some other program is already accessing it – vikkyhacks Aug 07 '13 at 15:07
The executable only says where it _wants_ to be loaded. The OS can decide the to load it into an entirely different location, and will take care of fixing the addresses as part of the load-time _relocation_. Read up on the PE and ELF executable formats for more info about this. – Michael Aug 07 '13 at 15:36

Why are labels not printed while re-writing assembly with just bytes, and how can a program always start at the same memory location?

2 Answers2