1

The program will ask for 2 strings (with a maximum size of 150 characters, for example).

INPUT

String 1: "Hello this is the first string"

String 2: "Woops this string is different and longer!"

OUTPUT

Number of equal characters = 26
Number of different characters = 15

I have in mind that we don't have to "calculate" the different characters as we can get the size of the longest string and subtract the number of equal characters but I don't know how to do this first comparison (the number of equal characters).

How can I do this? Could I do a macro for that comparison?

UPDATE

I am using EMU8086, MASM. I don't want to use string instructions.

This is my current code, following the instructions in the comments, but I still can't make it work.

.model small
.stack 100h
.data
    Text1 DB "Please enter the first phrase: ",13,10,'$'
    Text2 DB "Please enter the second phrase: ",13,10,'$'
    
    max1 DB 151 ;We add one to 150 for the ENTER
    charRead1 DB 0       
    
    max2 DB 151 ;We add one to 150 for the ENTER
    charRead2 DB 0
    
    TextoEquals DB "Number of equal characters: ",13,10,'$'
    TextoDiffer DB "Number of different characters: ",13,10,'$'
     
    linefeed DB 13, 10, "$" 
    size1 dw 0000h
    size2 dw 0000h
.code

AllMacros:
        
    MagicFunction MACRO
        cld                     ; Clear direction flag
        xor bx, bx              ; Reset counter
        mov si, OFFSET max1     ; Pointer to shorter string
        mov dx, size1           ; Length of the shorter string
        NextChar: 
            lodsb                   ; Fetch next character from shorter string
            mov di, offset max2     ; Pointer to longer string
            mov cx, size2              ; Length of the longer string
            repne 
            scasb
            jne NotFound 
            inc bx                  ; Increment counter
            mov byte ptr [di-1], 0  ; Strike by replacing with zero   
            
            NotFound:
                dec dx               ; Repeat for all characters in shorter string
                jnz NextChar   
    ENDM
    
    
    PrintText MACRO str
        MOV AH, 09h
        LEA DX, str
        INT 21h
    ENDM 
    
    PrintCls MACRO
        mov ah, 09h
        mov dx, offset linefeed
        int 21h 
    ENDM


Inicio:
    mov ax, @data
    mov ds, ax
    
    PrintText Text1
    
    mov ah, 0Ah        
    lea dx, max1 
    mov size1, offset max1
    int 21h
                           
    PrintCls  
    
    PrintText Text2
    
    mov ah, 0Ah        
    lea dx, max2 
    mov size2, offset max2
    int 21h  
    
    PrintCls  
    
    MagicFunction
    
    PrintText TextoEquals
    
    mov ah, 9
    lea dx, bx
    int 21h
     
    PrintCls
    
    PrintText TextoDiffer
    
    mov ah, 9
    lea dx, dx
    int 21h
    
END Inicio 
lLauriix
  • 145
  • 1
  • 11
  • If you're looking for matches at the *same* position, like `count += (s1[i] == s2[i]);`, then on modern x86 you could do this efficiently with SSE2 as in [Count number of matching bytes between two \_m128i SIMD vectors](https://stackoverflow.com/q/67274676) along with checking each string for zero bytes like a SIMD strlen would. But that seems not to be what you're doing? And I guess you're not worried about performance, just coming up with a scalar algorithm? – Peter Cordes Mar 30 '22 at 09:33
  • 1
    How are you defining "equal characters" exactly, so that you get 27 out of the 30 total in String1 being "equal"? Is this counting every character that appears anywhere in the other string? Like make a set of all characters in one string, and `for (i=0 ;i – Peter Cordes Mar 30 '22 at 09:39

2 Answers2

2

How would you do this on paper?

You would in turn consider every character from the shorter string and try to locate it in the longer string. If you found the character then you would increment your counter, and you would strike the character from the longer string so the character could not be found a second time.

From:

 -- ----------- --------- ----
Hello this is the first string

Woops this string is different and longer!
 -  ----------------- -- --- --    -

To:

Hello this is the first string (4 characters remain)
==> 30 - 4 = 26 are 'equal'

Woops this string is different and longer! (16 characters remain)
==> 16 are 'different'

In assembly code it could look like this:

S1 db "Hello this is the first string"
S2 db "Woops this string is different and longer!"

  ...

  cld                ; Clear direction flag
  xor bx, bx         ; Reset counter
  mov si, S1         ; Pointer to shorter string
  mov dx, 30         ; Length of the shorter string
NextChar:
  lodsb              ; Fetch next character from shorter string
  mov di, S2         ; Pointer to longer string
  mov cx, 42         ; Length of the longer string
  repne scasb
  jne NotFound 
  inc bx             ; Increment counter
  mov byte [di-1], 0 ; Strike by replacing with zero
NotFound:
  dec dx             ; Repeat for all characters in shorter string
  jnz NextChar

scasb compares AL with the byte at [es:di] setting flags, and increments the DI register (provided the direction flag is cleared).

repne scasb keeps comparing for as long as the count in CX is not exhausted and the zero flag is not set.

I have in mind that we don't have to "calculate" the different characters as we can get the size of the longest string and subtract the number of equal characters...

That is correct.
The above code should yield next results:

Number of equal characters = 26
Number of different characters = 42 - 26 = 16

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
  • 1
    For longer strings, this would run much faster with a histogram of counts, indexing the right position and incrementing (first loop) or decrementing (second loop). Linear search of the long string for every char of the short string has `O( N * M )` run time and destroys the long string, vs. `O( N + M )` time and a constant 256 bytes (or words) of extra space for counters. – Peter Cordes Apr 04 '22 at 02:10
  • I wonder if the question got 27 by including a terminating newline or possibly a `0` byte? If you don't cross off matches as you consume them, you just need a set (e.g. bitmap that you check with `bt [di], ax` / `adc dx, 0`). But then every byte of the short string, except the capital H, is "equal", so that doesn't explain 27 either. – Peter Cordes Apr 04 '22 at 02:15
  • Thanks for all your input @PeterCordes ! You are right, it should be 26 instead of 27. Just a naughty typo from me. – lLauriix Apr 06 '22 at 08:45
  • Thanks for your answer! The only problem I have is that I wanted the program to ask for those strings and I can't really make your code work like that. I usually do something like `Text DB "Please enter a phrase: ",13,10,'$'` and `S1 db 151` in the data section to ask for an input (from the keyboard) of a certain size. I don't want to use string instructions though @SepRoland – lLauriix Apr 06 '22 at 08:52
  • 1
    @lLauriix: the code here just needs pointers and lengths in registers; should be straightforward to adapt it to setting those pointers to the starts of buffers. Obviously writing code to prompt and read user input is a separate problem, which you can search for if you're not familiar with appropriate system calls for whatever OS you're running on (or BIOS or UEFI if a bootloader). – Peter Cordes Apr 06 '22 at 12:29
  • @PeterCordes I'm not having problems with reading the user's input but with using that input on his code. The line `mov si, S1` is returning the following error "(33) operands do not match: 16 bit register and 8 bit address". So basically I have to study this a little bit more and try to understand why it's happening. Thank you again for your help. P. S. I'm using emu8086 to learn assembly – lLauriix Apr 08 '22 at 15:49
  • 2
    @lLauriix: emu8086 uses MASM-style syntax, so you need `mov si, OFFSET S1` to get the address in a register. The code in this answer is for NASM, since you didn't specify a dialect of x86 assembly in the question, Sep picked one of the better ones. Also note that you'll need `byte ptr` not `byte`. – Peter Cordes Apr 08 '22 at 15:52
  • @PeterCordes Oh right! Gonna try and make it work with that, thank you so much! – lLauriix Apr 08 '22 at 15:56
1

Thank you for including code in your question and tagging it appropriately. It allows me to write a full review this time. Mind you, the other answer is not a half answer as you seem to imply. That answer is a par with your question at the time.


How can I do this? Could I do a macro for that comparison?

The code that I already provided in my previous answer, and that you have adopted, is not really meant to be used as a macro. If anything, it could become a subroutine, but I see no benefit from doing that in your program. Just inline the code.

Inicio:
    mov ax, @data
    mov ds, ax

As I wrote in my previous answer, the scasb instruction uses the ES segment register. You should set it up the same as the DS segment register:

Inicio:
    mov ax, @data
    mov ds, ax
    mov es, ax

Inputting

How buffered input works explains in great detail how the DOS.BufferedInput function 0Ah works.
Where you wrote:

max1 DB 151 ;We add one to 150 for the ENTER
charRead1 DB 0       

you forgot to reserve room for the actual characters that you would like to receive from the user. Next is how it needs to be:

max1      DB 151
charRead1 DB 0       
chars1    DB 151 dup(0)

When you invoke this DOS function 0Ah, you will also receive in the charRead1 byte the length of the actual input. That is the value that you need to store in your size1 variable (not the address of the input structure like your code is doing now!):

mov ah, 0Ah        
lea dx, max1 
mov size1, offset max1   <<<< is wrong
int 21h

And of course, you can only do this after the DOS call was made:

mov  dx, offset max1  ; Address of the input structure
mov  ah, 0Ah
int  21h
mov  al, charRead1    ; Length of the string
mov  ah, 0            ; Extending is needed because size1 is a word-sized variable
mov  size1, ax

Same goes for the second string.

Outputting

Displaying numbers with DOS explains in great detail how you can print the value from the 16-bit register AX.

mov ah, 9
lea dx, bx
int 21h

When you use the DOS.PrintString function 09h, DOS expects in DX a pointer to a string of characters. The BX register in your code holds a number and you still need to convert that into a string of digits. The 'method 2' snippet from the link does that and also displays the characters immediately using the DOS.PrintCharacter function 02h. Displaying characters with DOS or BIOS has info about many output functions. You only need to insert the code snippet at the ellipses.

mov  ax, bx
push bx             ;(0)
...
pop  bx             ;(0)

And since this is my original code, I'll do the copying for you...

    mov     ax,bx
    push    bx             ;(0)
    mov     bx,10          ;CONST
    xor     cx,cx          ;Reset counter
.a: xor     dx,dx          ;Setup for division DX:AX / BX
    div     bx             ; -> AX is Quotient, Remainder DX=[0,9]
    push    dx             ;(1) Save remainder for now
    inc     cx             ;One more digit
    test    ax,ax          ;Is quotient zero?
    jnz     .a             ;No, use as next dividend
.b: pop     dx             ;(1)
    add     dl,"0"         ;Turn into character [0,9] -> ["0","9"]
    mov     ah,02h         ;DOS.DisplayCharacter
    int     21h            ; -> AL
    loop    .b
    pop     bx             ;(0)

(0) Preserving BX is necessary because you need the value a second time to display the number of different characters.
You knew that you didn't have to "calculate" the different characters as you could get the size of the longest string and subtract the number of equal characters. But why didn't you do that subtraction in your program?

    mov     ax,size2       ;SizeLongestString
    sub     ax,bx          ; minus EqualCharacters

    mov     bx,10          ;CONST
    xor     cx,cx          ;Reset counter
.a: xor     dx,dx          ;Setup for division DX:AX / BX
    div     bx             ; -> AX is Quotient, Remainder DX=[0,9]
    push    dx             ;(1) Save remainder for now
    inc     cx             ;One more digit
    test    ax,ax          ;Is quotient zero?
    jnz     .a             ;No, use as next dividend
.b: pop     dx             ;(1)
    add     dl,"0"         ;Turn into character [0,9] -> ["0","9"]
    mov     ah,02h         ;DOS.DisplayCharacter
    int     21h            ; -> AL
    loop    .b

So it seems that we have inserted twice the code to convert and display a number. That's a good reason to put it in a subroutine and then just call it twice. See below.


Putting everything together we get the following program made for emu8086 (MASM)

.model small
.stack 100h
.data
    Text1       DB "Please enter the shorter phrase: ",13,10,'$'
    Text2       DB "Please enter the longer phrase: ",13,10,'$'
    
    max1        DB 151
    charRead1   DB 0       
    chars1      DB 151 dup(0)

    max2        DB 151
    charRead2   DB 0       
    chars2      DB 151 dup(0)
    
    TextoEquals DB "Number of equal characters: ",13,10,'$'
    TextoDiffer DB "Number of different characters: ",13,10,'$'
     
    linefeed    DB 13, 10, "$" 
    size1       DW 0
    size2       DW 0

.code
AllMacros:
    PrintText MACRO str
        MOV AH, 09h
        LEA DX, str
        INT 21h
    ENDM 
    
    PrintCls MACRO
        mov ah, 09h
        mov dx, offset linefeed
        int 21h 
    ENDM

Inicio:
  mov  ax, @data
  mov  ds, ax
  mov  es, ax
    
  PrintText Text1
    
  mov  dx, offset max1
  mov  ah, 0Ah
  int  21h
  mov  al, charRead1
  mov  ah, 0
  mov  size1, ax

  PrintCls  
  PrintText Text2
    
  mov  dx, offset max2
  mov  ah, 0Ah
  int  21h
  mov  al, charRead2
  mov  ah, 0
  mov  size2, ax

  PrintCls  

  cld                     ; Clear direction flag
  xor  bx, bx             ; Reset counter
  mov  si, offset chars1  ; Pointer to shorter string
  mov  dx, size1          ; Length of the shorter string
NextChar:
  lodsb                   ; Fetch next character from shorter string
  mov  di, offset chars2  ; Pointer to longer string
  mov  cx, size2          ; Length of the longer string
  repne scasb
  jne  NotFound 
  inc  bx                 ; Increment counter
  mov  byte ptr [di-1], 0 ; Strike by replacing with zero
NotFound:
  dec  dx                 ; Repeat for all characters in shorter string
  jnz  NextChar

  PrintText TextoEquals
    
  mov  ax, bx
  call DisplayAX

  PrintCls
  PrintText TextoDiffer
    
  mov  ax, size2
  sub  ax, bx
  call DisplayAX

  mov  ax, 4C00h         ; DOS.Terminate
  int  21h
    
DisplayAX:
    push    bx             ;(0)
    mov     bx,10          ;CONST
    xor     cx,cx          ;Reset counter
.a: xor     dx,dx          ;Setup for division DX:AX / BX
    div     bx             ; -> AX is Quotient, Remainder DX=[0,9]
    push    dx             ;(1) Save remainder for now
    inc     cx             ;One more digit
    test    ax,ax          ;Is quotient zero?
    jnz     .a             ;No, use as next dividend
.b: pop     dx             ;(1)
    add     dl,"0"         ;Turn into character [0,9] -> ["0","9"]
    mov     ah,02h         ;DOS.DisplayCharacter
    int     21h            ; -> AL
    loop    .b
    pop     bx             ;(0)
    ret

END Inicio 

Never forget to exit your program properly. For DOS, the preferred way is to use the DOS.Terminate function 4Ch.


[Recent addition to the question]

I don't want to use string instructions

That's ok for me. Simply replacing the lodsb and repne scasb string instructions with equivalent code:

  xor  bx, bx             ; Reset counter
  mov  si, offset chars1  ; Pointer to shorter string
  mov  dx, size1          ; Length of the shorter string
NextChar:
  mov  al, [si]           ; Fetch next character from shorter string
  inc  si
  mov  di, offset chars2  ; Pointer to longer string
  mov  cx, size2          ; Length of the longer string
Scan:
  cmp  al, [di]
  je   Found
  inc  di
  loop Scan
  jmp  NotFound
Found:
  inc  bx                 ; Increment counter
  mov  byte ptr [di], 0   ; Strike by replacing with zero
NotFound:
  dec  dx                 ; Repeat for all characters in shorter string
  jnz  NextChar

Strictly speaking, it's no longer necessary to set up ES, but it won't harm the program if you leave it in.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
  • 1
    This is an amazing explanation, I really can't thank you enough. I should have included the code before anything. I was really confused about saving the introduced strings and that `When you invoke this DOS function 0Ah, you will also receive in the charRead1 byte the length of the actual input.` felt like magic to me. I realized that the order of the .data section matters! (probably silly, but really mindblowing to me) Basically all the inputting was a nightmare, I didn't know how to get the size and where was my string stored. – lLauriix Apr 11 '22 at 00:07
  • Sorry for not being really clear about what I wanted because since I don't fully understand assembly, it is hard to explain. That's why I couldn't do that subtraction either, I didn't know how to get the size of my own input string. Like a dog chasing its own tail... So again, thank you so much for your answer!!!! I will give you that +rep as soon as it lets me do it. @SepRoland – lLauriix Apr 11 '22 at 00:14