0

Hey y'all I am having some issues with a part of my homework. The problem had me assign a string to a variable, use a variable to hold the size of the message variable, and then create another variable that adds 5 to the variable holding the size of the message, using the EQU method.

EDIT: Here is the problem I was assigned. I thought I did it correctly, but I keep getting a Segmentation Fault (Core Dump) which makes me think I messed something up. enter image description here

SECTION .data
message:
    DB "You already know what the next", 0Ah
    DB "variable will be, don't you?", 0
length: EQU ($ - message)
length5: EQU (length + 5)

SECTION .text
global _main
_main:
add eax, length5
inc eax

int 80h
Ryan Burke
  • 21
  • 4
  • Can you show what you've done so far, and what part's not making sense? – Ian McGowan Oct 09 '17 at 23:35
  • 2
    `EQU` is a compile-time construct you can't use it to add to something at runtime. There is a misunderstanding somewhere. Unless of course it's a compile-time constant string in which case it's barely more than doing `foo EQU bar+5`, surely that causes no problems for you? – Jester Oct 09 '17 at 23:37
  • @IanMcGowan message: DB "Foo", 0Ah DB "Bar", 0 length: EQU ($ - message) length5: EQU (length + 5) – Ryan Burke Oct 09 '17 at 23:49
  • @IanMcGowan 'SECTION .data message: DB "You already know what the next", 0Ah DB "variable will be, don't you?", 0 length: EQU ($ - message) SECTION .text global _main _main: mov eax, 0 add eax, length inc eax int 80h' – Ryan Burke Oct 09 '17 at 23:55
  • 1
    By `EQU` you define symbols to the given constant value, don't use colon after those (those symbols are not labels), use colon after address labels, like `message:` (and use NASM compile option `-w+orphan-labels` to let the NASM warn you about labels without colon, will save you from typos like `rett` being compiled as label instead of `ret`). See chapter 3 of NASM docs for details: http://www.nasm.us/doc/nasmdoc3.html ... BTW, there are no variables in assembly at all, just memory, instructions, addresses, ... Whoever is teaching this "variables" is confusing you in the long run. – Ped7g Oct 09 '17 at 23:55
  • And edit your question with code, and with exact description of problem you have, what you posted as first reply with `message: DB "Foo"` looks almost correct (not sure if the colons with `EQU` cause syntax error or not, should be easy to figure out just by running nasm, which I expect you to do, and post results). Rather edit your question and add those details in the question text. – Ped7g Oct 09 '17 at 23:57
  • @Ped7g Thank you for the suggestions. I edited the main post to show what I have so far. Thank you in advance. – Ryan Burke Oct 10 '17 at 00:07
  • Your code makes no sense, please describe what you intended to do by adding to an uninitialized `eax` then incrementing by one and invoking a random system call... PS: never mind, I looked at the image. See, that is why you should copy the text of that into the question. – Jester Oct 10 '17 at 00:08
  • The assignment tells you to allocate `length5` **in the data section**. You have not done that. – Jester Oct 10 '17 at 00:11
  • @Jester Sorry, I'm new to all of this. But isn't my `length5` already in the data section? – Ryan Burke Oct 10 '17 at 00:31
  • `EQU` is compile time only, it does not allocate storage at all. The assignment isn't terribly clear, but I guess the requirement to use `EQU` only applied to the original `length`. You probably want to do `length5 dd length+5`. – Jester Oct 10 '17 at 00:33
  • @Jester Yeah, we've got a first time professor teaching, in which we've only met twice in the first six weeks of schooling. We are all behind. Thank you for your assistance. – Ryan Burke Oct 10 '17 at 00:36

1 Answers1

2

I decided to toy around with the source code to show you some principles of assembly, and why I dislike your "variable" term usage, plus why Jester politely said "The assignment isn't terribly clear". (it's actually a bit ambiguous, as you can see I opted for different interpretation in my example)

First a source (I named it so_nasm_syntax_equ.asm) I used:

SECTION .data
message:        ; these compile to machine code bytes
    DB "You already know what the next", 0Ah
    DB "variable will be, don't you?", 0
length EQU ($ - message)        ; these don't compile to machine code, they define
length5 EQU (length + 5)        ; only constants for assembler during compilation
length5var:
    DB  length5                 ; this will compile as single byte in .data section
length5var2:
    DB  length+5                ; with value of that constant (plus another 5 here)
; meanwhile "length5var" is another constant for assembler, having as value memory
; address of target location where that byte containing the value will land.

SECTION .text
global _start
_start:

    inc byte [length5var]       ; one way to add 1 to the length5var
    add byte [length5var],1     ; another way to add 1 to the length5var
    ; one more way to add one to length5var (this time using two instructions)
    mov eax,1                   ; also demonstrating the aliasing of al/ax/eax/rax
    add [length5var],al         ; being single register, just of different bit size

    ; call sys_exit(0) to terminate correctly
    mov eax,1
    xor ebx,ebx
    int 80h

    ; you can't add to constant anything
    ; this will try to increment value in memory at address 0x41, leading to crash
    inc byte [length5]          ; as that memory address doesn't belong to the .data

    rett                ; warnings test
    ; vs
    ret

To compile it I used (64b "neon" linux distro used):

nasm -w+all so_nasm_syntax_equ.asm -l so_nasm_syntax_equ.lst -f elf32
ld -m elf_i386 so_nasm_syntax_equ.o -o so_nasm_syntax_equ

Output of compilation:

so_nasm_syntax_equ.asm:33: warning: label alone on a line without a colon might be in error

And the listing file produced, which I will finally interleave with comments/explanations:

 1                                  SECTION .data

First number on line is line number.

 2                                  message:        ; these compile to machine code bytes
 3 00000000 596F7520616C726561-         DB "You already know what the next", 0Ah
 4 00000009 6479206B6E6F772077-
 5 00000012 68617420746865206E-
 6 0000001B 6578740A           
 7 0000001F 7661726961626C6520-         DB "variable will be, don't you?", 0
 8 00000028 77696C6C2062652C20-
 9 00000031 646F6E277420796F75-
10 0000003A 3F00

The 8 digit hexa number after line number is "address" (offset into memory), the following trail of hexa digit pairs are the final machine code, i.e. byte values to be stored in the executable file, later loaded by OS into memory, initializing and preparing environment for it, and finally executing it by jumping to the entry point. The trailing "-" in the byte values just marks the machine code for that line is not finished and continues on the next line.

Note how the line 2 message: itself didn't produce any machine code. All it does is create symbol message, which is available to assembler during compilation (or also to linker, when you declare particular symbol as global). The value of symbol message here is 0x00000000 = memory address offset of the first byte, which has value 0x59, which is equal to letter 'Y' in UTF8 encoding (and also in ASCII encoding).

You can't deduct anything else from that message symbol, no idea how many bytes are defined after it, or that DB directive was used after it, etc, the message itself is just like memory address, nothing more. That's why I don't like word "variable" in Assembly, variables for example in C/C++ are much more, not only they point to the first byte of allocated space, but also the compiler is aware of the type of the variable, and total allocated size of it, using that further in expressions. Assembler has none of that, message = 0x00000000 and that's all about it.

11                                  length EQU ($ - message)        ; these don't compile to machine code, they define
12                                  length5 EQU (length + 5)        ; only constants for assembler during compilation

Here I defined two more constants with EQU directive, now length = 0x3C and length5 = 0x41, but they will never reach the binary, they are visible only to the NASM during compilation of remaining lines of this source code.

13                                  length5var:
14 0000003C 41                          DB  length5                 ; this will compile as single byte in .data section

Here is the length5 constant used to define value of single byte, pointed at by another symbol length5var = 0x0000003C.

15                                  length5var2:
16 0000003D 41                          DB  length+5                ; with value of that constant (plus another 5 here)

Here is another byte defined, this time using constant length, and arithmetic expression (constant+5), which can be evaluated during compilation, and so it will produce again the same 0x41 value in the executable. Also I defined another label ahead of it, so length5var2 constant is equal to 0x0000003D.

17                                  ; meanwhile "length5var" is another constant for assembler, having as value memory
18                                  ; address of target location where that byte containing the value will land.
19                                  
20                                  SECTION .text
21                                  global _start
22                                  _start:

Here _start value is defined as offset 0x00000000 into .text section (the message has in this listing same offset, but it relates to .data section and the two will end with different values after OS will load the binary into memory, and relocate it to target address assigned by OS during loading process).

Also _start is made global, so the linker can find it in the .o file and using it during linking process (to mark correct entry point to the app for OS loader).

23                                  
24 00000000 FE05[3C000000]              inc byte [length5var]       ; one way to add 1 to the length5var
25 00000006 8005[3C000000]01            add byte [length5var],1     ; another way to add 1 to the length5var

These should be self-explaining, just check how the length5var address 0x0000003C is part of machine code of the instruction (in it's pristine 0x0000003C value, the OS will relocate that to correct final address during loading of binary before execution).

26                                      ; one more way to add one to length5var (this time using two instructions)
27 0000000D B801000000                  mov eax,1                   ; also demonstrating the aliasing of al/ax/eax/rax
28 00000012 0005[3C000000]              add [length5var],al         ; being single register, just of different bit size

Here is another way of adding 1 to the memory value, this time using register al as source of value 1 for addition, and also by using al register in the add instruction the assembler is capable to deduct the memory operand size, so I don't have to add byte ahead of [length5var], because al is of byte size. al is of course equal to 1, because I load whole 32 bit eax with value 1, and al is alias of the lowest 8 bits of eax, which are then equal to value 1 too.

29                                  
30                                      ; call sys_exit(0) to terminate correctly
31 00000018 B801000000                  mov eax,1
32 0000001D 31DB                        xor ebx,ebx
33 0000001F CD80                        int 80h

This will terminate the code, actually the only visible effect from outside (terminating correctly without crash). To see those inc/add instructions in action you can use debugger and single stepping over them, validating that the memory value went from 0x41 to 0x42 (and from 0x42 to 0x43 with second addition, etc).

34                                  
35                                      ; you can't add to constant anything
36                                      ; this will try to increment value in memory at address 0x41, leading to crash
37 00000021 FE0541000000                inc byte [length5]          ; as that memory address doesn't belong to the .data

But if you will try to use that length5 constant in the same way, the NASM will just substitute the length5 symbol with the numeric value 0x41 and compile it as inc byte [0x41], which is understood as absolute addressing, trying to access memory at absolute address 0x41 (will be not relocated).

Actually this shows the message and length5 are not equal kind of symbolic constants, the message compiles to "address 0x00", making the NASM aware of it being linked to the .data section, and using it together with generating relocation data as needed, while the length5 is just pure numeric value 0x41. When you use it as an address, it will be not relocated, and absolute address 0x41 will be accessed (would cause crash, if the app wouldn't be already terminated by previous int 80h).

The values in machine code which are subjected to relocation are marked by [] in the machine code, compare that second inc machine code with the previous one. The opcode FE05 is same, the encoded value 0x3C vs 0x41 is different, but those [] marks (which are not part of the machine code in that particular place, just marking that in listing for reader of listing file) means, that the NASM+linker will generate accompanying relocation table for OS, which will know which bytes of code to patch with actual final address after the binary is loaded into memory.

So if you will check this binary disassembled in debugger while it is prepared to be executed, the first inc opcode FE05[3C000000] will look like FE05E4900408 (my OS under debugger loaded the binary in such way, that length5var ended at address 0x80490e4 in memory). The second inc opcode is still FE0541000000 (no relocation by OS loader done upon this one).

38                                  
39                                      rett                ; warnings test
40                                      ; vs
41 00000027 C3                          ret

This is just the test of the warnings about labels without colons. When used properly, this can help you to catch typos in instructions, while you can differentiate the instructions from labels by always using the colon after labels. So rett: will not produce warning then, but you see in source that is not instruction, but label.

Without the warnings the rett is silently turned into label, and if it was typo (instead of ret instruction), that 0xC3 machine code opcode for ret instruction will be missing in the code, producing unexpected behaviour of code. The example above does produce only single 0xC3 opcode, for the correct ret.


And to make this complete tour through basic things of assembly language usage, this is how the executable binary content looks, after applying strip so_nasm_syntax_equ to remove some useless symbols (debug) info:

$ strip so_nasm_syntax_equ
$ hd -v so_nasm_syntax_equ
00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 03 00 01 00 00 00  80 80 04 08 34 00 00 00  |............4...|
00000020  00 01 00 00 00 00 00 00  34 00 20 00 02 00 28 00  |........4. ...(.|
00000030  04 00 03 00 01 00 00 00  00 00 00 00 00 80 04 08  |................|
00000040  00 80 04 08 a8 00 00 00  a8 00 00 00 05 00 00 00  |................|
00000050  00 10 00 00 01 00 00 00  a8 00 00 00 a8 90 04 08  |................|
00000060  a8 90 04 08 3e 00 00 00  3e 00 00 00 06 00 00 00  |....>...>.......|
00000070  00 10 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  fe 05 e4 90 04 08 80 05  e4 90 04 08 01 b8 01 00  |................|
00000090  00 00 00 05 e4 90 04 08  b8 01 00 00 00 31 db cd  |.............1..|
000000a0  80 fe 05 41 00 00 00 c3  59 6f 75 20 61 6c 72 65  |...A....You alre|
000000b0  61 64 79 20 6b 6e 6f 77  20 77 68 61 74 20 74 68  |ady know what th|
000000c0  65 20 6e 65 78 74 0a 76  61 72 69 61 62 6c 65 20  |e next.variable |
000000d0  77 69 6c 6c 20 62 65 2c  20 64 6f 6e 27 74 20 79  |will be, don't y|
000000e0  6f 75 3f 00 41 41 00 2e  73 68 73 74 72 74 61 62  |ou?.AA..shstrtab|
000000f0  00 2e 74 65 78 74 00 2e  64 61 74 61 00 00 00 00  |..text..data....|
00000100  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000120  00 00 00 00 00 00 00 00  0b 00 00 00 01 00 00 00  |................|
00000130  06 00 00 00 80 80 04 08  80 00 00 00 28 00 00 00  |............(...|
00000140  00 00 00 00 00 00 00 00  10 00 00 00 00 00 00 00  |................|
00000150  11 00 00 00 01 00 00 00  03 00 00 00 a8 90 04 08  |................|
00000160  a8 00 00 00 3e 00 00 00  00 00 00 00 00 00 00 00  |....>...........|
00000170  04 00 00 00 00 00 00 00  01 00 00 00 03 00 00 00  |................|
00000180  00 00 00 00 00 00 00 00  e6 00 00 00 17 00 00 00  |................|
00000190  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
000001a0

At offset 00000080 you can see the inc opcode is already relocated by linker to the target address in .data section. While the other inc at offset 000000a1 is left intact, still having machine code fe 05 41 00 00 00.

Looks like this is quite instructive even for me, as I keep mixing up which part of machine code is patched by linker and which by OS during loading of binary (normally you don't need this while programming in assembly, the important part is to understand that those addresses and symbols are compile-time constants, and when you want to use dynamic memory management, you have to write all the code around that, storing/using the memory address values dynamically).

Ped7g
  • 16,236
  • 3
  • 26
  • 63
  • The OS doesn't patch anything while loading binaries, AFAIK. For ASLR to work on executables, you have to create PIEs in the first place, so the kernel can map your text+data wherever it wants. ([Default for gcc in many distros recently](https://stackoverflow.com/questions/43367427/32-bit-absolute-addresses-no-longer-allowed-in-x86-64-linux/46493456#46493456)). The dynamic linker will write the GOT while dynamic linking, but you made a static executable. And even then, it doesn't patch the main part of the text segment. Thus, everything has to be a link-time constant. – Peter Cordes Oct 10 '17 at 02:28