3

I'm making a program in TASM assembly (i honestly have no idea if it's 16 bit, x86, 64 bit maybe 8086?. I'm using TASM inside DOSBox for assembling, linking, and testing.

The user inputs characters until the user presses Enter key. then echos what they typed back to the console. This is my code:

IDEAL
model small
STACK 100h

DATASEG
wholeinput db 00

CODESEG
start:

mov ax,@data
mov ds,ax
    mov bx, 0 ; bx is the counter

    input:
    mov ah, 1 ; input moves the input to al
    int 21h

    cmp al, 13 ; 0xD is enter key
    je enterPressed ; if its not 'enter' then continue
    add [wholeinput+bx], al ; add the given character to the end of the string

    inc bx    ; inc counter of string length
    jmp input ; continue loop


    enterPressed:
    add [wholeinput+bx], '$' ; add to the last byte of the thing a string terminator
    mov ah, 9
    mov dx, offset wholeinput ; print the whole thing
    int 21h


exit:
mov ax,4c00h
int 21h
END start

The first time I run the program it works as expected. When I run the same program a second time with the same input it prints gibberish to the console.

My theory is that the memory from my previous run is somehow copied to the next run. this could be a very stupid thing to say but I'm a noob...

Screenshot

What could be my problem? How can I solve it?

Edit: Thanks to Mike Nakis and Peter Cordes for this solution:
The problem was that I did not reserve enough space for the input.

wholeinput db 00

only reserve one byte. Fix:

wholeinput db 100 dup(0)

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
ori6151
  • 592
  • 5
  • 11
  • what happens if you hit [Enter] instead of typing "hello world" and then hitting [Enter] ? – Mike Nakis Oct 07 '19 at 15:43
  • As a deleted answer points out, `wholeinput db 00` is only 1 byte, but you store multiple bytes there. Potentially you're overwriting DOS's own data. – Peter Cordes Oct 07 '19 at 15:47
  • @MikeNakis Just blank newline. nothing interesting – ori6151 Oct 07 '19 at 15:48
  • @PeterCordes what should I define "wholeinput" as? wholeinput dw 00? – ori6151 Oct 07 '19 at 15:48
  • @PeterCordes and also how is it then that the first time I run my program it's fine? – ori6151 Oct 07 '19 at 15:50
  • That would work if you wanted to store at most 2 bytes. Your input loop doesn't have any upper bound on maximum length. Look up in your assembler manual, or search, for how to reserve space for a buffer, like `db 100 dup(?)`. – Peter Cordes Oct 07 '19 at 15:52
  • 1
    @ori6151 what I am trying to get to is this: if you just type nothing and hit [Enter], does your program, when run twice, give any garbage the second time? Or does the erratic behavior disappear when you enter the empty string? – Mike Nakis Oct 07 '19 at 15:52
  • It might be that it runs fine the first time because the memory you overwrite is "owned" by the shell, or by parts of DOS that do program startup. Your program is already started so the only parts of DOS it depends on are the `int 21h` stdout printing system call handler, and the exit system call. To find out exactly what's going on, you'd want to use a debugger on the whole system, e.g. DOSBox's built-in debugger, to see what later used the memory you scribbled over. (If that's even the problem.) – Peter Cordes Oct 07 '19 at 15:53
  • @MikeNakis yes it does not print garbage when I just hit [Enter]. – ori6151 Oct 07 '19 at 15:57
  • @PeterCordes I dont understand why does it work then, if i only have space for 1 byte, yet the program still accepts that? shouldn't that be like some kind of an overflow? – ori6151 Oct 07 '19 at 15:59
  • 1
    Yes, it is an overflow. But DOS doesn't have any memory protection so you can write and read data from there. But it's messed up when something else tries to read *its* data from there. – Peter Cordes Oct 07 '19 at 16:14
  • 1
    Besides `wholeinput` being 1 byte as others mention, something I find very odd is that you use `add [wholeinput+bx], al`. `add` wouldn't be correct, maybe you meant `mov`? Same with `add [wholeinput+bx], '$'`. Should be `mov`. The incorrect placement of `$` being added to the previous produces a character that isn't a `$` on subsequent runs of the program and thus DOS prints garbage until it finds a `$` or infinitely prints if there is no `$` at all. – Michael Petch Oct 07 '19 at 18:30
  • 1
    Likely memory was initialized with 0s at system boot and it appeared to work the first time it was run, but failed on subsequent runs because that memory had data from a previous run of the program and you were `add`ing to memory rather than `mov`ing. This would explain why it goes back to working properly if you reload DOSBox. – Michael Petch Oct 07 '19 at 18:32
  • @MichaelPetch: you're spot on on your init with 0s theory. This happens when using DOSBox DOS emulation. OTOH, out of curiosity, I've tried with several DOS versions, and they don't bother clearing memory, returning gibberish from the first run. – ninjalj Oct 08 '19 at 03:31

1 Answers1

4

Your theory (that memory from a previous run is still there) is probably correct, not a stupid idea at all.

db 00 will only reserve space for a single byte. So you're writing past the end of your DATA segment for any input of more than 0 bytes. (Just pressing Enter right away will lead to your program storing a '$' to that byte.)

You need db 100 dup(0) for an array of 100 zeros. See What does `dup (?)` mean in TASM?

Why reserving insufficient space causes this particular kind of behavior, I do not know, and to tell you the truth, this is not the behavior I would expect from this type of mistake. But then again, in our profession, all kinds of weird things happen, and when you see what the most likely cause of the problem is, you can either quickly fix it and proceed with your life, or you can spend hours troubleshooting to try and figure out precisely why the particular behavior was observed. The choice is yours.


I am not sure whether TASM was ever made for anything other than 16-bit; In any case, the code definitely looks 16-bit, and this is also definitely being assembled by TASM as 16-bit, otherwise the model small clause would be giving an error, as this clause only exists in 16-bit mode.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • Thanks. I will try to implement that when i come home. But i just to ask, what does "model small" mean? Also do you know where i could find for example a documentation for this kind of assembly? What kind of assembly is it? – ori6151 Oct 07 '19 at 16:06
  • No, MASM/TASM syntax is `count dup(value)`. So `100 dup(?)` will reserve 100 bytes of don't-care values. – Peter Cordes Oct 07 '19 at 16:15
  • Is MASM and TASM the same syntax? – ori6151 Oct 07 '19 at 16:19
  • 2
    @ori6151 MASM and TASM (even Watcom ASM and JWASM) are similar same syntax (pretty close) IF you aren't using IDEAL mode (which you are). Ideal mode is specific to TASM. – Michael Petch Oct 07 '19 at 16:43
  • What is IDEAL mode? – ori6151 Oct 07 '19 at 16:47
  • 2
    @ori6151 IDEAL mode is a variation of the MASM syntax invented by Borland that tries to be more consistent than the original syntax. In my oppinion, the IDEAL syntax really is notably better, but I don't think it's worth the incompatibility to MASM if you care even a slight bit about source portability. – Michael Karcher Oct 07 '19 at 19:57
  • 1
    @ori6151 with respect to "ideal mode", what Michael Karcher said is correct. Back in the day when I was writing in assembly I considered ideal mode to be so much more superior to masm syntax that I did not mind at all burning that compatibility bridge, but then again, that's just me. – Mike Nakis Oct 08 '19 at 07:19
  • 1
    @ori6151 the "model" directive is basically a hint that you are giving to the assembler about your segment strategy and therefore also the size of your code pointers and the size of your data pointers. It is necessary because the 16-bit intel architecture is capable of addressing more than 64k of memory, and it achieves this by using 16-bit segment registers and 16-bit index registers or 16-bit offsets. "small" means that all of your code will fit within 64k, and all of your data will also fit within 64k, but your code segment may differ from your data segment. – Mike Nakis Oct 08 '19 at 07:26
  • 1
    For more information, see http://www.c-jump.com/CIS77/ASM/Directives/lecture.html#D77_0020_model_directive – Mike Nakis Oct 08 '19 at 07:26