1

I need help with strings in emu8086. I have initialized a string:

str1 db "0neWord"

And I have an empty string:

str2 db ?

Now I need to check all letters in str1 and copy to str2, but if the letter in str1 is 0, I need to replace it with O. If not, I need to just copy the letter.

How can I do this?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131

2 Answers2

2

str2 db ? is not an empty string. db stands for "define byte", and that ? means single uninitialized byte.

The db "0neWord" is assembler's convenience, it will compile into series of bytes defined as '0', 'n', 'e', ..., 'd'. There's no such thing as "string" type in assembler, everything is compiled into machine code, which can be viewed as series of bytes. What "type" of data are stored in memory depends on the instructions used to access them, but in the memory everything is just series of bytes and can be viewed as such.

This is probably a good time for you to check emu8086 debugger documentation, and look at the memory at address str1 after loading the code into debugger, to see how it did compile.

So as soon as you will copy second byte from str1 to str2, you will start overwriting some memory you didn't expect to overwrite.

To allocate some fixed size memory buffer you can use for example str2 db 100 DUP(?) doing 100 times ? definition to db, thus reserving 100 bytes of memory there, next bytes of machine code in the same section will be compiled beyond str2+100 address.


To do anything with str1 "string" you need to know:

1) its address in memory, the x86 assembler has many ways how to get that, but two most straightforward are:

  • mov <r16>,OFFSET str1 (r16 is any 16b register)
  • lea <r16>,[str1] (does the same thing in this case)

2) its size OR structure. You didn't put any structure there, like nul-terminated strings have byte with value 0 at their end, or DOS int 21h, ah=9 service to display string expects string terminated with dollar sign '$', etc. So you need at least size. And EQU directive of assembler, and "current position" can be used to calculate size of the str1 like this:

str1 db "0neWord"
str1size EQU $-str1  ; "$" is assemblers "current_address" counter

Hm, I tried to verify this first, by reading some docs, but it's very difficult for me to find any good complete emu8086 documentation (found something like "reference", and it's completely missing description of assembler directives).

I wonder why so many people still land on this, instead of linux + nasm/similar, which are completely free, open source and documented.

So let's hope the emu8086 works like MASM/TASM and that I still recall that syntax correctly, then the above mentioned size definition should work. Otherwise consult your examples/docs.


Finally, when you have address, size, and large enough target buffer (again to load it's address you can use OFFSET or lea in emu8086), you can code your task for example in this way:

    ; pseudo code follows, replace it by actual x86 instructions
    ; and registers as you wish
    ; ("r16_something" means one of 16b register, r8 is 8b register)
    lea   r16_str1,[str1]   ; load CPU with address of str1
    mov   r16_counter,str1size  ; load CPU with str1 size value
    lea   r16_str2,[str2]   ; load address of target buffer
loop_per_character:
    mov   r8_char,[r16_str1] ; read single character
    cmp   r8_char,'0'
    jne   skip_non_ascii_zero_char
    ; the character is equal to ASCII '0' character (value 48)
    mov   r8_char,'O'   ; replace it with 'O'
skip_non_ascii_zero_char:
    ; here the character was modified as needed, write it to str2 buffer
    mov   [r16_str2],r8_char
    ; make both str1/2 pointers to point to next character
    inc   r16_str1
    inc   r16_str2
    ; count down the counter, and loop until zero is reached
    dec   r16_counter
    jnz   loop_per_character
    ; the memory starting at "str2" should now contain
    ; modified copy of "str1"

    ; ... add exit instructions ...

Hmm.. turns out the "pseudo code" is full x86 code, you just have to assign real registers to the pseudo ones, and replace them everywhere in source.

I tried to put there very extensive comments (by my point of view), so can understand every instruction used. You should consult each one with Intel's instruction reference guide, cross-reading it with whatever tutorial/lessons you have available for Assembly, until you will feel like you understand what is register, memory, etc.

Also debug the code instruction by instruction, checking the state of CPU (register values, flags) and memory content after each instruction, to get the idea how it works.

Ped7g
  • 16,236
  • 3
  • 26
  • 63
2

There are multiple ways to do that. Here are some examples:

1) With String Instruction:

.model small

.data 

        str1 db "0neWord$"

        size equ $-str1

        str2 db size dup ('') 


.code  

main:

        mov ax, @data
        mov ds, ax 

        mov cx, size

        cld          ; DF might have been set, but we want lodsb to go forwards
        lea si, str1
        mov ax, 0  
        mov bx, 0

     copyStr:

        lodsb  ;str1 to al

        cmp al, '0'
        je alterChar        

        mov str2[bx], al
        jmp continue

     alterChar:

        mov str2[bx], 'o'

     continue:

        inc bx

        loop copyStr 

        mov str2[bx], '$'

        mov ah, 09h 
        lea dx, str2
        int 21h     

        mov ah, 04ch
        int 21h                  

end main 

2) Without String Instruction:

.model small

.data 

        str1 db "0neWord$"
        str2 db ?

.code  

main:

        mov ax, @data
        mov ds, ax

        mov si, 0               

        call copyStr      

        mov ah, 09h
        lea dx, str2
        int 21h

        mov ah, 04ch
        int 21h

   copyStr proc 

        mov bx, 0

      compute:            

           mov bl, str1 [si]

           cmp bl, '0'
           je alterChar 

           mov str2[si], bl

           jmp continue           

        alterChar:

           mov str2 [si], 'o'    

        continue:

           inc si

           cmp str1[si], '$'
           je return                      
           jmp compute

      return:

           mov str2[si], '$' 
           ret     

   copyStr endp    

end main 

LODSB Instruction you can learn more about string instructions from here

3) With lodsb / stosb and simplified / optimized:

.model small
.data 
        str1 db "0neWord$"
        size equ $-str1

        str2 db size dup ('') 

.code  
main:
        mov   ax, @data
        mov   ds, ax
        mov   es, ax       ; stosb stores to [es:di]

        mov   si, OFFSET str1
        mov   di, OFFSET str2
        cld          ; make sure stosb/lodsb go forwards

     ; copy SI to DI, including the terminating '$'
     copyStr:           ; do {
        lodsb             ; str1 to al

        cmp   al, '0'
        je    alterChar        

     doneAlteration:
        stosb             ; al to str2
        cmp   al, '$'
        jne   copyStr    ; } while(c != '$')

        mov   ah, 09h      ; print implicit-length string
        mov   dx, OFFSET str2
        int   21h     

        mov   ah, 04ch     ; exit
        int   21h                  

     alterChar:
        mov   al, 'o'
        ;jmp   doneAlteration
        stosb
        jmp   copyStr

end main 
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ahtisham
  • 9,170
  • 4
  • 43
  • 57
  • You could use `stosb` to make this much more compact, and simplify the branching so for inputs other than `'0'` there's no taken branch. (i.e. jump to a block that sets `al` and jumps back into the loop, instead of forcing the not-taken side to also jump.) – Peter Cordes Nov 16 '17 at 19:06
  • Also, `lea dx, str2` is one byte longer than `mov dx, OFFSET str2`, and has no advantages. – Peter Cordes Nov 16 '17 at 19:29
  • I made an edit to add a more efficient version with the changes I suggested in comments. (untested). Instead of using the `loop` instruction, I branched on the terminating `$`. [This is faster on modern CPUs](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently), but maybe slower on a real 8086 because `loop` is more compact (less code inside the loop, even though it needs more setup outside). My version is probably still faster on a real 8086 though, especially if most characters aren't `'0'`. – Peter Cordes Nov 16 '17 at 19:33
  • And BTW, I could have posted my version as a separate answer, but this question didn't really need yet another answer but the inefficiency of your code was bugging me so I decided to improve yours. Feel free to re-edit however you like. – Peter Cordes Nov 16 '17 at 19:35
  • 1
    @PeterCordes well your code prints garbage characters. – Ahtisham Nov 16 '17 at 19:40
  • 1
    Thanks for testing it, what did I get wrong? Is `str2` not terminated properly after the copy? According to http://spike.scu.edu.au/~barry/interrupts.html#ah09, `ah=9 / int 21h` should print a `$`-terminated string from `ds:dx`. Oh, do we need a `cld` to clear DF? I'm used to 32 / 64-bit code where the usual calling conventions require DF=0 on function entry/exit. – Peter Cordes Nov 16 '17 at 19:47
  • changed from this `mov ah, 09h mov dx, OFFSET str2 int 21h` to this `mov ah, 09h mov dx, es:[di] int 21h` this fixed the problem :) – Ahtisham Nov 16 '17 at 19:54
  • Hrm, that doesn't make sense, you're loading the first 2 bytes of the string and passing it as a pointer. But mentioning `es` reminds me I forgot to initialize `es` = `ds`, and `stosb` writes to `[es:di]` – Peter Cordes Nov 16 '17 at 19:57
  • @PeterCordes yeah `stosb` writes to es that is where i got confused when i saw your code. – Ahtisham Nov 16 '17 at 19:59
  • Did you actually test with `mov dx, es:[di]`? If so, I'm curious how it worked. Can you single-step it? – Peter Cordes Nov 16 '17 at 20:04
  • @PeterCordes it actually prints the input itself. `es:[di]` prints the input array not the one stored in es. – Ahtisham Nov 16 '17 at 20:06
  • `es:[di]` prints `0neWord` instead of `oneWord` the one with `0` not `o`. – Ahtisham Nov 16 '17 at 20:09
  • I reason i ran your code is because i was curious about `stosb` i just learned it stores the result in es but your code was completely denying it. :) and btw when i wrote my code i thought about `stosb` but i didn't give much attention to it. otherwise i would have wrote a simplified one just like yours ;) – Ahtisham Nov 16 '17 at 20:14
  • 2
    Your `mov dx, es:[di]` only worked by coincidence. It sets `dx = 0`, and `ds:0` is where `str1` is stored. (The memory at `es:[di]` hasn't been written yet, because it's **past the end** of where `stosb` was writing. emu8086 starts with it zero-initialized). In emu8086, with an `exe` template, `ds` and `es` start off as `0x0700`, but `@data` is `0x0710`. `str1` is at the start of the data segment, so `mov si, OFFSET str1` assembles to `mov si, 0`. This is why later zeroing `dx` by loading from untouched memory before `ah=9h / int 21h` prints `str1`. – Peter Cordes Nov 16 '17 at 20:21
  • 1
    And yeah it's pretty cool learning new instructions and figuring out what they're good for. :) There are still some [AVX512 instructions](https://en.wikipedia.org/wiki/AVX-512) I haven't really understood the full purpose of yet (and probably some that I forget even exist; there are a ton of new instructions). So I totally get how cool it is to find a good use for an instruction you're just learning about. – Peter Cordes Nov 16 '17 at 20:40
  • @Preter Cordes woah! 512bit ? am i dreaming ? lol i am not able to learn 16bit instruction fully. how can i learn 512bit one. That would be really difficult to learn. – Ahtisham Nov 16 '17 at 20:47
  • They do the *same* thing to all elements in parallel. It only gets complicated when the elements aren't just `a[i .. i+15]`, and you're shuffling to line up some elements with some other elements. For looping over an array doing `c[i] = a[i] + b[i]`, it's no more complicated than with SSE2 with 4 packed elements in a 128b vector. If you're curious, see https://deplinenoise.wordpress.com/2015/03/06/slides-simd-at-insomniac-games-gdc-2015/ for a good intro to SIMD. – Peter Cordes Nov 16 '17 at 20:54