2

I'm working on a "Write your own Operating System using assembly only by holding your hand along the way" project. I've written from scratch everything with the like of MikeOS with the exception of harddrive reading and writing. The importance of reading and writing is to allow for an operating system to be larger then the Basic Input Output System memory of 512 length.

I wrote a test case for reading from the hard drive and printing the first character on screen. Using qemu, I can define which file to use as the harddrive within regards to that I define the positions of the registers the program can point to that harddrive with 0x80 option during the read function inturrupt.

The first test code was with out the harddrive selection and thus the reading code would read its own running binary and produce an "a" presuming due to the binary had produced an "a"(ascii) in the beginning of the running file loaded into the Basic Input Output System.

So my question here is...

  1. When I define the harddrive selection why does it not read the string "Hello" in the file on screen like it does when I do not define the harddrive?

  2. Is my code correct to begin with and the "a" produced on screen is a coincidence that there is an "a" in the beginning of the produced binary file that runs in the emulator?

For everyone sake please respond with explanations and code. There are two questions I've found in Stack Overflow that are both answered with English explanations with what is wrong with their code but no code to demonstrate why they are correct in their explanations. It is a moot point to explain but not show your work.

Do not suggest me to use C, C++, or anything else besides assembly, if you have done your research and I have done my research and dis-assemble high-level languages like C, C++ or anything else you will see a simple "defining and reading a string on to the console" is not only significantly larger but just careless in regards to efficiency. These dis-assembled C code comparisons to handwritten assembly examples will also be released in the project files.

I appreciate your time and when I release the project as open source tutorials society will appreciate how much time you have saved anyone else in the same position as me. (Human knowledge belongs to the world -Antitrust Movie, Year 2001)

[bits   16]
[org 0x7c00]

message db "Hello"

mov ah, 0x02    ;point to read sector function
mov ch, 0x00    ;ch track/cylinder number
mov dh, 0x00    ;dh head number
mov cl, 0x00    ;cl sector number
mov dl, 0x80    ;drive number 80 is drive0, 81 is drive1
int 0x13

;read disk into terminal.... might be working, just displays 'a'...
mov ah, 0x0e
mov al, [bx]
int 0x13

;Purpose, move to the next character read from file
mov ah, 0x0e
mov es, bx
mov al, [bx]
int 0x13

times 510 - ($-$$) db 0
dw 0xAA55



;==============================================================
;               Reference Notes
;==============================================================
;
;;;;;;;;;;;;; Interupt 02 - Read from sector ;;;;;;;;;;;;
;AH = 02
;AL = number of sectors to read (1-128 dec.)
;CH = track/cylinder number  (0-1023 dec., see below)
;CL = sector number  (1-17 dec.)
;DH = head number  (0-15 dec.)
;DL = drive number (0=A:, 1=2nd floppy, 80h=drive 0, 81h=drive 1)
;ES:BX = pointer to buffer
;
;on return:
;AH = status  (see INT 13,STATUS)

;AL = number of sectors read
;CF = 0 if successful
;   = 1 if error

;;;;;;;; Inturrupt 13 - Video card functions ;;;;;;;;
;AH = 02, means TTY mode, print to console mode
;AL = which character to print to console
Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • Don't put your string at the beginning or the cpu will try to execute it as code. Also, `mov al, [bx]` without `bx` or `ds` being set is very suspicious. – Jester Mar 12 '18 at 19:43
  • Hi, and welcome to Stack Overflow. Your post was written in quite angry and arrogant tone. I have edited your post to contain a more neutral and clear question. – Hannes Karppila Mar 12 '18 at 19:53
  • My response to your assumptions of arrogance and angry is that the generation of today rarely has face to face communication to clearly comprehend even in-person interactions, not to mention the assumptions made over/through text. It is imperative to add as much information as possible to clearly convey the best answer that will apply to my question. Here is a link describing the psycology behind why your assumptions of emotions are wrong: https://www.fastcodesign.com/3036748/why-its-so-hard-to-detect-emotion-in-emails-and-texts – Dahlia Radio Mar 12 '18 at 20:03
  • @Jester you have not interpreted the code correctly in your mind, that is merely just a string, there is nothing to execute. The compiler and the execution system, knows the difference between a string and a "command". Bx is set during the Inturrupt 2 is executed, I gave references notes for a reason at the bottom of the code. Please refrain from posting before you think. – Dahlia Radio Mar 12 '18 at 20:07
  • @Jester is also correct that before calling [`int 13h/ah=2h`](http://www.ctyme.com/intr/rb-0607.htm) you need to set up ES:BX to point the the memory location the disk read is suppose to read data to BEFORE you do the `int 0x13` – Michael Petch Mar 12 '18 at 20:34
  • Wow there is a lot the assembly books and tutorials did not make clear and in regards to that they ended up making other things axioms when they are not. thank you i will try all of these suggestions. – Dahlia Radio Mar 12 '18 at 20:39
  • Well it appears that this "order of operations test" concludes that the compiler understands how to compose the difference between a string and a command. In retro spect, everything is a command and a label is a dynamic command to compose a string as Ascii bytes that is not confused by the executor, i used the following code to prove that a string isn't "executed" because the binary is in ASCII binary structure and not command binary that looks like a mov or jmp, etc [bits 16] [org 0x7c00] message db "int 0x13 Hello" mov ah, 0x0e mov al, 'w' int 0x10 times 510 - ($-$$) db 0 dw 0xAA55 – Dahlia Radio Mar 12 '18 at 20:53
  • @michael, yeah, Your correct, that is suppoed to be a 10 inturrupt(display character) and not a 13(harddrive). I will see what happens. – Dahlia Radio Mar 12 '18 at 20:54
  • @MichaelPetch acoording to the reference notes found on a wiki you set CL to define what sector to read and the return *sets al* to the number of sectors *read*, but your right, cl should be set to 01 instead of 00. Here is the link of refernce notes for inturrupt 13 https://en.wikipedia.org/wiki/INT_13H#INT_13h_AH=02h:_Read_Sectors_From_Drive – Dahlia Radio Mar 12 '18 at 21:08
  • i removed the hello string just to prevent confusion. i'm going to post up a revised code edit soon – Dahlia Radio Mar 12 '18 at 21:12
  • 2
    @MichaelPetch you should think before you post! :D @Dahlia: the ascii code of `Hello` corresponds to machine code for instructions `dec ax; gs: insb; insb; outsw;` so that's what the cpu will do. – Jester Mar 12 '18 at 21:14
  • **message db "mov ah, 0x0e"** produces the hex code of "*6d 6f 76 20 61 68 2c 20 30 78 30 65*" which indicates a "string". Therefore the only execution going on in the CPU will be that the CPU recognizes it as a "string". The *command* **mov ah, 0x0e**, it produces the hex code of *b4 0e* which will execute the *CPU operations* of *mov ah, 0x0e* instead of recognizing it as a string. It's this kind of confusion is why i want to build an OS, kolibriOS & mikeos & possible baremetal have a lot of bloated logic/code due to this logic, increasing the perpetual confusion. when will it end? – Dahlia Radio Mar 12 '18 at 22:13
  • here is another reference im sifting through, a detailed explaination of using int 13 http://www.isdaman.com/alsos/protocols/fats/nowhere/HD.HTM – Dahlia Radio Mar 12 '18 at 22:30
  • I think jester was right about the bx not being set... this is really confusing about what is true according to the https://en.wikipedia.org/wiki/INT_13H#INT_13h_AH=02h:_Read_Sectors_From_Drive link it says es:bx is a parameter... i thought i read somewhere it was a result.... when will the confusion end! lol seriously... im so embarrsed. – Dahlia Radio Mar 12 '18 at 22:37
  • *which indicates a "string"* no way; the CPU doesn't recognize anything as a string, it just tries to interpret bytes as instructions, and pretty much any byte sequence can be interpreted as such. The fact that some bytes are to be interpreted as a string and not as instructions doesn't come from some particular encoding, but from the fact that your code won't tell to the CPU to jump in the middle of a string. Your `db "mov ah, 0x0e"` string, if executed, is `insw` `outsw` `jna ip+20` `popa` `push word 0x202c` `xor [bx+si+0x30],bh` `gs`. – Matteo Italia Mar 12 '18 at 22:38
  • So basically a string dosen't exist but it exists? ASCII exists but dosen't exist, a cpu can compare, split and stream strings to a screen but can't understand a string, you can produce characters on a screen but it dosen't know what a character is? seriously? At least i have the binary/hex showing that a string exists as ASCII binary sequence and not a binary command.... can you explain matteo how a string exists but dosen't exist? – Dahlia Radio Mar 12 '18 at 22:41
  • @DahliaRadio: bytes are bytes, it all comes down to how you interpret them. If you tell to the CPU to copy them in video memory screen, you'll see them as text; if you put them on the way of the CPU execution path, they'll be interpreted as machine code instructions; in assembly all you have is bytes and instructions that tell to the machine what to do with them. There's nothing intrinsic about bytes that tells to the machine their type - it all comes down to how you tell to the CPU to consider them. – Matteo Italia Mar 12 '18 at 22:44
  • That's the same with numeric data - if you tell to the CPU to load into a floating point register data that was originally put there as a string or as an integer, it will happily oblige. There's no type check of any kind on memory, and being x86 a von Neumann architecture, the boundaries between data and code are definitely blurry. – Matteo Italia Mar 12 '18 at 22:46
  • Also, see how the CPU actually sees your string... https://pastebin.com/T2nL8fhm – Matteo Italia Mar 12 '18 at 22:54
  • Oh! Dang, I didn't even think to dis-assemble the produced assembled code to see what it produced in the eyes of a CPU. That is very confusing that the hex comes out as ASCII but disasembled code shows in the eyes of the CPU it will run it as commands – Dahlia Radio Mar 12 '18 at 23:13

1 Answers1

2

By placing message at the start before the code the processor will attempt to decode the string into instructions and execute them. This can lead to unexpected behavior and crashes. You can place the data after the code and before the boot signature. The assembler (NASM) produced a sequence of bytes containing hello at the beginning of the bootloader. The CPU can't tell the difference between bytes that make up a string or what is actual code. The CPU sees a sequence of bytes and attempts to decode the bytes and executes them.

Your code doesn't prevent the processor from executing past the end of our actual instructions. You can fix this by using jmp $ to create an infinite loop or turning off interrupts and hlt the processor with the instructions cli followed by hlt.

Int 13h/ah=2h disk read requires:

  • setting the number of sectors to read in AL
  • requires the segment:offset address where the disk read should in ES:BX before the interrupt call
  • Don't hard code DL (boot drive) and use what the BIOS passed us in DL
  • Sector numbers (unlike cylinders and heads) are 1 based values, not 0 based. This means that the first sector on the disk (the bootloader) is at CHS=(0,0,1), not (0,0,0). Using a sector number of 0 is invalid.

This bootloader reads a copy of the bootloader into memory at 0x0000:0x7E00 from Cylinder, Head, Sector(CHS) (0,0,1) from the boot disk passed to the bootloader in DL by the BIOS. This code follows my General Bootloader Tips by setting up our own stack and setting the DS register to 0 explicitly.

bits   16
org 0x7c00

    cld             ; Set forward direction for string related functions like LODSB
    xor ax, ax      ; XOR register with itself sets register to 0
    mov ds, ax      ; DS = 0x0000
    mov ss, ax      ; SS:SP =0x0000:0x7C00
    mov sp, 0x7c00  ; Set SS:SP since we will be loading data from disk to memory
                    ;    We want to ensure the data we read is not in the same
                    ;    memory as the stack. Stack will grow down from 0x0000:0x7c00
                    ;    in memory just below the bootloader

    ; Read 1 sector starting at CHS = 0,0,1 (the boot sector)
    ; to 0x0000:0x7E00. This creates a copy of the bootloader
    ; in memory in the memory just after where we are loaded at
    ; 0x0000:0x7c00
    mov es, ax      ; ES:BX memory to read to 0x0000:0x7E00
    mov bx, 0x7e00  ;     0x0000:0x7E00 is in the memory just after our bootloader
                    ;     which ran from 0x7c00 to 0x7dff.
    mov al, 0x01    ; Number of sectors to read
    mov ah, 0x02    ; point to read sector function
    mov ch, 0x00    ; ch track/cylinder number
    mov dh, 0x00    ; dh head number
    mov cl, 0x01    ; cl sector number
                    ; dl is set by the BIOS before transferring control to our bootloader
    int 0x13

    ; Print the message from the copy of the bootloader loaded at 0x7E00
    mov si, message + 0x200
    call print_string
    cli
    hlt

; Function: print_string
;           Display a string to the console on display page 0
;
; Inputs:   SI = Offset of address to print
; Clobbers: AX, BX, SI

print_string:               ; Routine: output string in SI to screen
    mov ah, 0eh             ; BIOS tty Print
    xor bx, bx              ; Set display page to 0 (BL)
    jmp .getch
.repeat:
    int 10h                 ; print character
.getch:
    lodsb                   ; Get character from string
    test al,al              ; Have we reached end of string?
    jnz .repeat             ;     if not process next character
.end:
    ret

message db "Hello", 0

times 510 - ($-$$) db 0
dw 0xAA55

print_string is a function I wrote that prints the NUL terminated string. A memory offset for the string is provided in the SI register.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • Thank you, In regards to this I have much to learn on proper assembly code but the result of all this will clear the same confusion others are having. I appreciate the time you have put into this. – Dahlia Radio Mar 12 '18 at 22:46