How to count number of vowels in a string in 8086 ALP?

Question

I wrote an assembly program that does the calculation of number of vowels in a string which is read by the user. Reading of string and calculation of length are working fine. But when comparing the characters of the string, it is not working for the first two characters.This is my code.

.MODEL small
.STACK
.DATA
 input db 10,?
 length db ?
 count db ?
.CODE
.STARTUP 
 ;reading string
 mov dx,00h
 mov dx,offset input
 mov ah,0Ah
 int 21h 

 ;calculating length
 mov length,00h
 mov si,offset input+2

 ;checking for vowels
 loopi: cmp [si],'$'
    je next
    add length,01h
    inc si
    loop loopi
 next:
    mov cx,00h
    mov cl,length 

    mov si,offset input+2 
    mov count,00h
 counting:cmp [si],'a'
      je count1 
      cmp [si],'e'
      je count1
      cmp [si],'i'
      je count1
      cmp [si],'o'
      je count1
      cmp [si],'u'
      je count1
      inc si
      loop counting
      cmp cl,00h
      je exit
  count1:inc count 
      inc si
     loop counting 
 exit: 
.EXIT
 end

This code is not comparing/checking the first two characters of the string. Can someone help me with this as soon as possible? Any help would be very appreciated. Thank you so much.

Sep Roland · Accepted Answer · 2019-02-21T15:34:52.867

Reading of string and calculation of length are working fine. But when comparing the characters of the string, it is not working for the first two characters.

As it happens, it's precisely the comparing part that is fine! Your troubles start with inputting and exist because you didn't understand what the question mark does in assembly programming.

input db 10,?
length db ?
count db ?

In all of these lines, the question mark ? represents a single byte that most, if not all, assemblers will initialize to the value zero 0. What you thus get is:

input  db 10, 0
length db 0
count  db 0

This is fine for length and count, but not so for input which is supposed to be the input buffer for the DOS buffered input function 0Ah. You don't have the required storage space really. It's the memory for length, count, and so on that gets erroneously overwritten!

The solution is input db 10, 0, 10 dup (?). This allows inputting 9 characters. Why 9? Because DOS always appends a carriage return 13 to the input and that carriage return also needs a byte in this 10-byte storage space defined by 10 dup (?).

Also this carriage return explains why your calculation of the length will fail. You are searching for "$" when you should be searching for the ASCII code 13.

Of course calculating the length is redundant since DOS informed you about it already. The 2nd byte of the input structure is the length.

mov cx, 0
mov cl, [input+1] ; length

All together:

.DATA
 input  db 10, 0, 10 dup (?)
 count  db ?
.CODE
.STARTUP 
 ;reading string
    mov  dx, offset input
    mov  ah, 0Ah
    int  21h 

 ;checking for vowels
    xor  cx, cx            ; Also clears the CX register like `mov cx, 0`
    mov  count, cl         ; Count = 0
    mov  si, offset input+2 
    mov  cl, [si-1]        ; length is 2nd byte
 counting:
    cmp  [si], 'a'
    je   count1 
    cmp  [si], 'e'
    je   count1
    cmp  [si], 'i'
    je   count1
    cmp  [si], 'o'
    je   count1
    cmp  [si], 'u'
    je   count1
    inc  si
    loop counting
    cmp  cl, 0        \ You can replace these 2 by
    je   exit         / a single `jmp exit`
 count1:
    inc  count 
    inc  si
    loop counting 
 exit: 
.EXIT

A better solution

not using the slow loop instruction
minimizing memory access
using string primitives like lodsb
not failing if the string is empty!
minimizing the amount of jumping around

is presented here:

 ;checking for vowels
    cld                ; For completeness because `lodsb` depends on it
    mov  si, offset input+2 
    mov  dl, -1
 vowel:
    inc  dl
 other:
    lodsb              ; This is `mov al, [si]` followed by `inc si`
    cmp  al, 'a'
    je   vowel 
    cmp  al, 'e'
    je   vowel
    cmp  al, 'i'
    je   vowel
    cmp  al, 'o'
    je   vowel
    cmp  al, 'u'
    je   vowel
    cmp  al, 13
    jne  other
    mov  count, dl

This is tagged `emu8086`. `loop` isn't inherently slow on real 8086, or presumably on an emulator. It's only slow if you could have used fewer (or smaller) instructions like `cmp`/`jcc` instead of using a separate reg as a loop counter. Of course it's not generally useful on *modern* x86. But neither is `lodsb` (3 uops and has a false dependency on the old RAX). Haswell and later have efficient-ish `lodsd` and `lodsq` (2 uops), but recommending `lodsb` along with avoiding `loop` is not sensible. Of course 8086 doesn't have `movzx` to avoid partial register false dependencies... — Peter Cordes, Feb 22 '19 at 03:24
And if you want to talk about efficiency, if we had 386 then we could efficiently check a 32-bit bitmap with `c - 'a'` as the index. See [User Appreciation Challenge #1: Dennis ♦](//codegolf.stackexchange.com/a/123458) for a 32-bit immediate bitmap of consonants that I test with `bt`, after upcasing and range-shifting to get the 0..25 alphabet position of letters. I don't see an easy way to use that for 8086 that's more efficient than a chain of cmp/jcc *on* 8086 (where branch misses aren't a thing). — Peter Cordes, Feb 22 '19 at 03:44
It was really helpful for a beginner like me. I tried implementing this in my program and it worked. Thanks a lot. — saketha yellanki, Mar 30 '19 at 08:40

How to count number of vowels in a string in 8086 ALP?

1 Answers1