3

I have a task in assembly course to count the number of words in a string, and I need to save my answer in cx register. (im working on 80x86 processor)

So I'v set:

cx to 0 - this will be my counter

bx to 0 - this will be my index

And I want to know if I use it properly, this is my code:

.model small 
.stack 100h
.data

A db '  this   is  a  test  $'

B db 100 dup('$')

.code
    mov ax, @data
    mov ds, ax

    mov cx, 0 
    mov bx, 0

looping:
    cmp A[bx], ' '
    jne foundchar
    inc bx
    jmp looping


foundchar:
    inc bx
    cmp A[bx], ' '
    je foundword
    cmp A[bx], '$'
    je soff
    jmp foundchar



foundword:
    inc cx
    inc bx
    jmp looping

soff:
    .exit

end

And someone else from my class did it differently, she set si to the offset of A... and I didn't really understand this solution:

mov cx,0
mov si, offset A

mov dl,0


next2:
    inc si
    mov dl,[si]
    cmp dl,"$"
    JE soff
    cmp dl, " "
    JE test1 ;if the char is space so lets check what is before the char
    jmp next2 ; if the char no space jump back to the loop

test1:
mov al,[si-1]
cmp al, ' '
je  next2
add cx,1
jmp next2

soff:
mov al,[si-1]
cmp al,' '
je sofsofi
add cx,1

sofsofi:

.exit
end 

Please help me to understand it, and what would be the more proper way to do it.

Thanks allot

Joe
  • 2,543
  • 3
  • 25
  • 49

1 Answers1

5

To answer question in title: No. It is not okay to use BX as index. Use Destination Index or Source Index (depending of whether you write or read data). Using BX will work, but it is not considered "best practice" as DI and SI purpose is to hold indexes.

And now the explanation:

Although (x86) registers are named, you can (with certain limitations of course) use them for task colliding with their names but it can lead to mistakes and problems with interpreting your code.

I will try to write here a simple guide to registers for beginners.

Limitations:

  • Some operations read and write data from/to certain registers, eg. MUL uses (E)AX and (E)DX, so you must make sure to not store data your program still require (or PUSH and later POP them) in registers that will be modified.
  • Not every register gives you (direct) access to 8bit parts.
  • Some registers have cool specialized functions, so in order to not compromise them you need to use them properly.

Remember that all registers listed below can be used in Extended (32bit) form by adding E to name (eg. EAX is 32bit version of Accumulator, where lower 16 bytes contains 16bit AX). Registers will be listed in order of importance (from perspective of beginner).

  1. AX - accumulator register is typically used as primary parameter and result. In much simpler days when x86 was still to be heard about processors usually had two registers for parameters, eg. X and Y on Commodore 64, and Accumulator for storing result. In short, its name is now mostly historical and it will usually be first register you use for virtually everything - for example arithmetic/logic operations and DOS (21h) interrupts which take argument via AH (higher byte of AX). Aside of operations that require value to be passed via AX you can consider it as a primary variable.
  2. DX - data register - not much to describe here, it stores data. I decided to list it after AX because those two works together quite often - for example MUL (multiplication) uses DX to store data overflowing from AX. So yeah, its "just" secondary variable.
  3. CX - counter register - also quite simple, it is just another general purpose register. This time however it is used specifically as counter (as name implies) by operations such as loops (LOOP family operands) and string instructions (REP family operands). As long as you are not using instructions that modify CX you are free to use it for whatever you need (but always check if there are no better spots for your data).
  4. BX - base register - long story short, since 32bit arrived this register lost its purpose of "offsetting" memory addresses. So it is "free" register which exists mostly for backward compatibility. Poor guy. Please use it, do not let mists of history devour such little brave t̶o̶a̶s̶t̶e̶r̶ register.

Forbidden registers! Yes, there are such ones.

  1. SP and BP - stack and base pointers - default approach is to not use them directly, those are handled by PUSH, POP, CALL, and RET instructions for managing stack and allow you to use functions. However they can be used for heretic tricks bordering with dark magic, do you dare to desecrate holy stack frame? Changing values of those registers can cause real stack overflow. Do not use unless you really, really sure you want it (and know how). (In certain conditions BP can become another "free" register like BX but lets ignore it for now)

Now starts the part of scary registers, with intimidating names and complicated operations tied to them. Also there is no Higher and Lower 8bit part of them available (but you can still use bit mask to extract them if needed).

  1. DI and SI - destination and source index - those two are used as pointers to memory blocks (buffers/arrays/strings etc) for automated data copying/writing with REP operation or other string/array based ones. If you are not using such confusing operations then you can use DI and SI as data registers, with limitation of not being able to (easily) select lower or higher byte. Perfect choice for loop based operations.
  2. Segment registers - CS, SS, DS, ES, FS, GS... (eek! Nowdays its a whole army of them) - Although story can be very long, there is short explanation - two main segments are Code Segment and Data Segment. Code Segment contains code (surprising, right?) and all data you declared right in it. Segments are often used together with DI and SI registers for copying data - for example LODSD expects SI to point at memory address in Data Segment, therefore in short ("flat") code it is necessary to first inform CPU that your Data and Code segments are one. You really should not use those registers or else Segmentation Fault will kick you in the face.

You can read more on pages like this neat guide.

As for your code there are few advices I can give you right now (For total overhaul of your script you will need to wait for tomorrow or Monday as I am kinda busy this weekend)

Using SI/DI is better. In your code A[bx] is equivalent to (A memory address)+(value in SI), CPU is very simple creature so it prefer simple [SI] instead of doing math on the fly.

It is good thing that you set Data Segment on your own as it is up to environment (system/cpu/emulator) to set initial value. While it is good habit to use segments for dedicated content you can, for such simple program, feel free to store a bit of data in code segment, it will even speed execution up by reducing memory jumps and address calculations. Then you can write: MOV AX, CS MOV DS, AX (can't do it directly). It is guaranteed for Code Segment to be the one with your current code (otherwise it wouldn't execute).

Zeroing registers can be made much cooler! Want to put 0 in AX? Use XOR AX, AX (AX = AX exclusive or AX, which as logic states will always result in 0). Not only it is shorter in memory (smaller code = faster loading), it also executes faster than copying value (and CPU's nowdays are smart enough to make it even faster). Additional advantage are style points.

Counter as result? Well, counter register is made as loop counter not data counter. For storing data DX or BX more intuitive (BX more than DS, as nothing requires data in this register anymore so its "just variable"). But if result in CX is required then you must obey, I guess.

Well, that's all for tonight. If something is unclear leave a comment.

PTwr
  • 1,225
  • 1
  • 11
  • 16