2

I write a program for DOSBox (using TASM). I need to output inputted string as characters line by line. I figured out how to input string, but I have wrong output.

There are 2 problems:

  1. I do not know how to get length of the string, so there are too much blank lines.
  2. There is not one character per line in the output.

My code:

.model small
.data
    message db   'String: $'
        string db 10 dup(' '), '$' 
.stack 256h
.code
main:
    mov ax, @data
    mov ds, ax

        lea dx, message ; load message to dx
    mov ah, 09h ; output message
        int 21h

        xor dx, dx
        lea dx, string ; input string
    mov ah, 0Ah
    int 21h

        ; crlf
        mov dl, 10
        mov ah, 02h
        int 21h
        mov dl, 13
        mov ah, 02h
        int 21h
        
        ; output string char by char 
        mov si, 0
        mov cx, 10 ; a number of loops, but how to get the length of the string?
        output:
              lea dx, string[si]
              mov ah, 09h
              int 21h

              mov dl, 10
              mov ah, 02h
              int 21h
              mov dl, 13
              mov ah, 02h
              int 21h
              
              inc si
        loop output

    mov ah,4ch
    int 21h
end main

Sep Roland
  • 33,889
  • 7
  • 43
  • 76

2 Answers2

2

The other answer provided a working solution that is based on a string terminator like 0 or 13.
This answer chooses to use the string length as it is already available from DOS, and also more in line with what the OP is asking about.


.data
  message db   'String: $'
  string db 10 dup(' '), '$' 
.stack 256h

"I figured out how to input string, ..." Not really!
That string line is supposed to define the input structure for the DOS.BufferedInput function 0Ah. DOS expects to find the storage length in the first byte and will return the length of the actual input in the second byte. How buffered input works has the details.
What you have written for string translates to a 10-byte region of memory entirely filled with the number 32 (ASCII of ' '), and optimistically followed by a dollar character (the linked post explains why this is not a good idea). Because the first byte is 32, it will permit DOS to legally use the next 34 bytes for inputting purposes. You have a serious buffer overflow!
If you want to allow for an user input of 10 characters, then the correct definition is string db 11, 0, 11 dup(0).


There are 2 problems:

  1. I do not know how to get length of the string, so there are too much blank lines.
  2. There is not one character per line in the output.
  1. DOS already gave you the length of the string in the second byte of the string input structure. Just fetch it: xor cx, cx mov cl, string[1].
  2. Your code (lea dx, string[si] mov ah, 09h int 21h) is using a DOS function (09h) that outputs a string of characters. What did you expect? Use a single character output function instead: mov dl, string[si] mov ah, 02h int 21h.

Your code (with comments)

 lea  dx, message
 mov  ah, 09h
 int  21h
 lea  dx, string
 mov  ah, 0Ah
 int  21h
 mov  dl, 10             ; (*)
 mov  ah, 02h
 int  21h
 ; output string char by char 
 mov  si, 2              ; Characters start at offset 2
 xor  cx, cx
 mov  cl, string[1]      ; Count of characters
 jcxz done
output:
 mov  dl, string[si]     ; Fetch one character
 mov  ah, 02h
 int  21h                ; Print one character
 mov  dl, 13
 mov  ah, 02h
 int  21h                ; Print carriage return
 mov  dl, 10
 mov  ah, 02h
 int  21h                ; Print linefeed
 inc  si                 ; Move to next character
 loop output             ; Repeat for all characters
done:

(*) At the conclusion of the DOS.BufferedInput function 0Ah, the cursor will be in the first column on the current row. You don't need to output a carriage return (13). Just a linefeed (10) will do fine. This is certainly true for a regular MS-DOS, but as @ecm wrote in her comment exceptions to the rule exist.


My code

The above output loop does 3 system calls per iteration. If we reduce this number to 1, the loop can run about 10% faster1.
On each iteration we copy the current character to a $-terminated string that has the carriage return and linefeed bytes included. A single invokation of the DOS.PrintString function 09h then does the outputting. This speed gain does come with one drawback: if the input string has an embedded $ character, it will not get displayed.

1 True as long as the screen does not have to scroll, because screen scrolling is comparatively very slow.

.data
message db 'String: $'
string  db 11, 0, 11 dup(0) 
TheChar db 0, 13, 10, '$'

 ...

 lea  dx, message
 mov  ah, 09h
 int  21h
 lea  dx, string
 mov  ah, 0Ah
 int  21h
 mov  dl, 10             ; (*)
 mov  ah, 02h
 int  21h
 ; output string char by char 
 mov  si, 2              ; Characters start at offset 2
 lea  dx, TheChar
 xor  cx, cx
 mov  cl, string[1]      ; Count of characters
 jcxz done
output:
 mov  al, string[si]     ; Fetch one character
 mov  TheChar, al
 mov  ah, 09h
 int  21h                ; Print one character plus CR plus LF
 inc  si                 ; Move to next character
 loop output             ; Repeat for all characters
done:
Sep Roland
  • 33,889
  • 7
  • 43
  • 76
  • 1
    A few problems: Some DOSes (ZDOS) do not output a Carriage Return after the input obtained from function 0Ah. It is valid to enter an empty string in which case your loops will get a zero in `cx` initially and then loop for 65_536 iterations. And the last loop won't work correctly if you get a dollar sign as part of the input, in which case the dollar sign's iteration will output the empty string instead of '$',13,10. – ecm Oct 23 '22 at 14:12
  • 1
    @ecm Very good remarks! I did not consider an embedded $ character. I will correct the code in case of an empty input, although most, if not all, of the questions asked on stackoverflow seem to blattantly ignore all forms of invalid/special-case user input. – Sep Roland Oct 23 '22 at 14:21
1

Here is how to print a string char by char:

Declare the string such as:

str: db "Hello, world!", 0

Then, to get the length of the string you should search for the null terminator at the end of the string. Either way the code could be something like:

;
; output string char by char
; input: pointer to string in si
;
printstr:
    mov ah, 0x02 ; get 0x02 inside ah to use with int 21h
printstrloop:
    cmp byte[si], 0 ; check if we are at the end of string
    je printstrend ; end if it is
    mov dl, byte[si] ; get the char inside dl to print it
    int 0x21 ; output

    mov dl, 0x0d ; newline character CR
    int 0x21 ; do newline

    mov dl, 0x0a ; newline character Lf
    int 0x21 ; do newline

    inc si ; increase the pointer
    jmp printstrloop ; loop

printstrend:
    ret

You can then do the input like:

Reserve space for string

length: resb 1
actualLength: resb 1
str: resb 255 ; I don't know TASM syntax. This is in NASM

Get the input

mov byte[length], 0xFF
mov ah, 0x0a
mov dx, length
int 0x21

Then you have a CR terminated string at str and the length at actualLength. Then you can use the printstr with cmp byte[si], 0x0d instead of the null terminator

  • 2
    One of his big problems is that he's using `int 21h/ah=0ah` (to read a string from input) incorrectly and the buffer he has doesn't take into account the length of the string to input is the first byte, and the number of characters actually read is in the second byte. That string isn't guaranteed to be zero terminated and likely to have a carriage return on the end. I usually just use the length byte returned from the call and use it to replace the carriage return with a 0 (NUL character) or if using DOS Write String later one a `$`. – Michael Petch Oct 21 '22 at 10:33
  • 2
    You should write a CR LF (13 10 or 0xD 0xA) after each character, not only LF. On DOS terminals only LF generally doesn't return the carriage. – ecm Oct 21 '22 at 10:36
  • @MichaelPetch I tried to do something about the input in the edit but I am not sure if it is correct – Özgür Güzeldereli Oct 21 '22 at 10:48
  • 1
    There are 2 byte before the string. first byte is maximum number to read and then the byte after is filled in with the length of the string when the string is read (minus the carriage return). The buffer itself is terminated by a carriage return. There are other questions about this issue with good answers. An example: https://stackoverflow.com/a/56622882/3857942 – Michael Petch Oct 21 '22 at 10:53
  • 1
    @ÖzgürGüzeldereli [The buffer format for int 21h function 0Ah](http://fd.lod.bz/rbil/interrup/dos_kernel/210a.html) is one byte buffer length (X), one byte actually used length (Y), and then X bytes of buffer space. The actual data that you receive is stored starting at `ds:dx` plus 2. It is always terminated by a CR, but you can use the length indicated by Y instead if you prefer. (Y is always less than X.) – ecm Oct 21 '22 at 10:54
  • @ecm is the format okay now? – Özgür Güzeldereli Oct 21 '22 at 11:37
  • @ÖzgürGüzeldereli `length: resb 1` is only correct if you write to this byte to initialise it before calling function 0Ah. You need to pass the length DOS is allowed to fill. – ecm Oct 21 '22 at 12:05
  • Well, what is byte here? ```cmp byte[si], 0``` –  Oct 21 '22 at 16:10
  • @hacema4275 I don't know the TASM way of doing it. It is supposed to get one byte from memory where ```[si]``` points to. – Özgür Güzeldereli Oct 21 '22 at 16:14
  • @hacema4275 apparently it is the same in TASM. ```byte``` there just tells the assembler the size of our operation. We just want to move one byte hence one character. – Özgür Güzeldereli Oct 21 '22 at 16:29
  • @ÖzgürGüzeldereli You shouldn't mix `db` and `resb` directives in NASM. And you should allocate 255 bytes for your buffer if you set the first length byte to 255. That is, 257 bytes total for buffer length + actual used length + buffer. – ecm Oct 22 '22 at 11:02
  • @ecm added an edit – Özgür Güzeldereli Oct 22 '22 at 11:04