0

Okay, i started learning 8086 assembly like month ago, and up until now i didn't have much problems learning it, but now i am stuck with strings. The problem is how can i Iterate over a string in 8086 and manipulate the characters? Also i have a task for my microprocessors course to remove all the ' characters from a given string (the string "proce'so'r" in my code) and then compare the newly acquired string with the first one and check if they are equal . The thing is i don't even know how to iterate it. It really was not explained in class, so i ask for help here. Here is my code so far (just for string iteration and printing the characters and it does not work, don't know why):

data segment
 string db "proce'so'r"
ends

stack segment
    dw   128  dup(0)
ends

code segment
start:

    lea di, string
    mov cx, 10

    for:
        cmp cx, 0
        je end


        mov dl, [di]
        mov ah, 02h
        int 21h 

        inc di
        loop for     
    end:


mov ax, 4c00h
int 21h  

ends

end start
cadaniluk
  • 15,027
  • 2
  • 39
  • 67
mouseepaad
  • 71
  • 1
  • 10
  • Looks OK. Instead of `di` **and** `cx` I'd use only `di`, initialize it to zero, then increment it by one as long as `di < 10` and use `di` as the index to save one register. You could also use the `lods` instruction with an appropriate prefix. Do you have any specific question? – cadaniluk Oct 26 '15 at 12:02
  • I do , the code doesn't work and it is not ok :). So there is a bug and i cannot find, i removed cx and did what you said, and still it doesn't work. – mouseepaad Oct 26 '15 at 14:00
  • @mousepaad OK, partially my fault but you should include a specific question. Try `dec cx` and `jmp for` instead of `loop for`, which increments `cx`, not decrements it. Also, if you haven't initialized the segment registers,, do it. – cadaniluk Oct 26 '15 at 14:02
  • Ok, i tried with jmp instead of loop, and i also tried this code http://stackoverflow.com/questions/27889010/how-can-i-loop-over-the-characters-of-two-strings-simultaneously , but it just does not work. I seriously do not know what the problem is :(. – mouseepaad Oct 26 '15 at 14:12
  • @cad: loop does `dec cx / jne` (but without touching flags). Mousepad: set up the loop with the conditional branch at the end. If there's a condition that requires skipping the first iteration, test for it outside the loop. (In asm, `do{}while()` loops are the most natural.) I have no idea what DOS systems calls do. Did you single-step your code in a debugger? – Peter Cordes Oct 26 '15 at 14:13
  • Sorry i don't understand you man. :) Could you please post a code here of what you mean? – mouseepaad Oct 26 '15 at 14:19
  • @PeterCordes Crap, I'm really sorry. I even checked the manual for a description of the mnemonic and somehow deduced `loop` **in**crements `cx`. Utter confusion. o.O But at least the two-instruction-way omits the conditional jump. – cadaniluk Oct 26 '15 at 14:28
  • 3
    You forgot to initialize `DS`. Insert directly after `start` two lines: `mov ax, data` & `mov ds, ax`. – rkhb Oct 26 '15 at 17:51

2 Answers2

0

expanding on my comment: a more efficient loop structure would be:

data segment
 string db "proce'so'r"
 stringlen equ $-string   ; subtract current string's address from current address
ends

start:

    mov ax, data
    mov ds, ax     ; Assuming rkhb is correct about segments

    lea   di, string
    mov   cx, stringlen   ; having the assembler generate this constant from the length of the string prevents bugs if you change the string

    ;; if stringlen can be zero:
     ;  test cx,cx
     ;  jz end

    ;; .labels are "local" and don't end up as symbols in the object file, and don't have to be unique across functions
    ;; emu8086 may not support them, and this Q is tagged as 8086, not just 16bit DOS on a modern CPU.
    print_loop:
        mov   dl, [di]
        mov   ah, 02h  ; If int21h doesn't clobber ah, this could be hoisted out of the loop.  IDK.
        int   21h 

        inc   di
        dec   cx
        jg   print_loop  ;  or jne
    end:

          ;  Or save a register (and the mov to initialize it) with
          ;   cmp  di, offset string+stringlen 
          ;   jb   print_loop

         ;; loop print_loop  ; or save instruction bytes, but slower on modern CPUs

mov ax, 4c00h
int 21h

A more common way to handle strings is to terminate them with a zero-byte. So the loop bound would be test dl,dl / jnz, with no counter needed.

Also, note that using si as a source pointer and di as a dest pointer is typical.


You could copy your string while skipping a certain character by doing

    mov   dl, [si]
    inc   si
    cmp   dl, '\''   ; or write the value as a hex constant instead of a character literal
    je  nocopy
     mov   [di], dl
     inc   di
nocopy:

as part of your loop. At the start of the loop, you want si pointing to the first char in your input string, and di pointing to a buffer big enough to hold the result. The character to skip could be in a register, instead of hard-coded as an immediate operand to cmp.

If you really want to save code bytes, at the expense of speed on modern CPUs, you could use the string-move instructions lodsb / stosb, except that they load to / store from al.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    Did you mean `jg` instead of `jge`? Since you print out the character first and decrement CX and compare I think it should be `jg`. @rkhb is correct about the manual updating of DS (even on EMU8086). I don't believe EMU8086 supports the concept of local labels. I believe it will choke on the period. For an EMU8086 example you might want to remove it. – Michael Petch Oct 26 '15 at 23:44
  • @MichaelPetch: Thanks, I did mean `jg`. So EMU8086 is an assembler as well as emulator? Because labels go away after assembling. I consider myself lucky that I've never actually programmed 16bit segmented-memory machines. I learned x86 asm on 32 and 64bit Linux, and find gdb fairly good for single-stepping etc. – Peter Cordes Oct 27 '15 at 02:00
  • Yes EMU8086 is a combined assembler/8086/IBM BIOS/DOS emulator/Debugger (reasonably complete, there are some holes). It is pretty much an academic tool. GDB wasn't really designed for debugging 16 bit code (try to step across a software interrupt like _int 21h_ and watch what happens). GDB also has no concept of CS:IP so if you debug a program that spans multiple segments you'll find the debugger will not work as expected. There are a set of extensions that can help with real mode debugging with GDB that can be found [here](http://ternet.fr/?p=gdb_real_mode). – Michael Petch Oct 27 '15 at 02:13
  • 1
    @MichaelPetch: oh yeah, I didn't mean to suggest GDB was any good for 16bit code. I was saying 16bit code sucks; don't spend time on it if you don't have to. :P I learned m68k asm before x86, so maybe I didn't find 32bit x86 asm too complex because I already understood the basics. 16bit has more odd corners, though, like you can't use `[cx]` as an effective address, or even as an index register. 32bit addressing modes are more uniform, so in many ways 32bit is easier to learn. (not to mention no segments, and 386 features like movzx instead of cbw in ax). – Peter Cordes Oct 27 '15 at 02:18
  • It seems academia wants to keep 8086 segmented assembler programming alive and well ;-). I agree though I think academia could use an overhaul. GNU/Linux is very easy to get, costs nothing, and is more likely to be a skill set that would be usable in the real world. A nice MINGW/MSYS2 environment on Win32 platforms can be a reasonable solution. I agree that using more modern processors makes things simpler. – Michael Petch Oct 27 '15 at 02:19
  • @MichaelPetch: The 2nd year undergrad comp sci course I took at school (Dalhousie) that taught assembly language and optimization used a RISC architecture for examples, I think. – Peter Cordes Oct 27 '15 at 02:21
  • U of Calgary 25 years ago (I am dating myself a bit) taught its first year Comp Sci weeder courses on PDP-11 simulator running on Sun mainframes. Second year was generally 6502. I started assembler on Z80/6502/8088 processors in the 80s. Dealing with 8088 segmented architecture was required. It is usually not a skill I use anymore except for Stackoverflow ;-) – Michael Petch Oct 27 '15 at 02:27
  • @Michael: I learned m68k asm in about 1997 while tweaking a Mandelbrot-set program on the Atari ST. (Like an Amiga: 68000 with 4MB of RAM). I had typed in the C source code from a magazine or book, and was making tweaks to try stuff or make it run faster. gcc ran too slowly, so I started just modifying the asm from an earlier compile :P Later that year, I went from Atari to a Linux PC when I started undergrad. So you have a few years on me :P – Peter Cordes Oct 27 '15 at 02:38
0

You need to initialize DS to use the defined data segment.

Complete Solution:

DATA SEGMENT
 STRING DB "PROCE'SO'R"
ENDS

STACK SEGMENT
    DW   128  DUP(0)
ENDS

CODE SEGMENT
START:

    MOV AX,@DATA
    MOV DS,AX

    LEA DI, STRING
    MOV CX, 10H

    FOR:
        CMP CX, 0
        JE END

        MOV DL, [DI]
        MOV AH, 02H
        INT 21H 

        INC DI
        LOOP FOR     
    END:

MOV AX, 4C00H
INT 21H  

ENDS

END START

Note the following after Start:

    MOV AX,@DATA
    MOV DS,AX
Pollob
  • 29
  • 1
  • 4
  • `cmp cx, 0` / `je end` doesn't need to be inside the loop. The `loop` instruction already does that for every iteration except the first. And if you're going to use [the slow `loop` instruction](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently), you might as well use `jcxz`; skipping `loop` loops when they need to run zero times appears to be its main intended use-case. – Peter Cordes Dec 27 '17 at 18:34
  • True. I just copied the given code & added the required portion. – Pollob Dec 28 '17 at 18:39
  • So the only thing you changed was to add 2 lines, and uppercase everything? This would be a better answer if you had a bit more English text explaining the problem. – Peter Cordes Dec 29 '17 at 02:09