I'm just having fun with turbo c to draw "sprites" on an 8086/286 (emulated with pcem) with an MCGA/VGA card.
Compiled with turbo c 3.0 it should work on real 8086 with MCGA. I'm not using the VGA mode x because it is a bit complex and I don't need extra vram for the things I want to do, even if there is some flickering on the screen, it's ok :).
In C, I have a bunch of memcpys moving data from the loaded sprite struct to the VGA in mode 13:
byte *VGA=(byte *)0xA0000000L;
typedef struct tagSPRITE
{
word width;
word height;
byte *data;
} SPRITE;
void draw_sprite(SPRITE *sprite){
int i = 0; int j = 0;
for(j=0;j<16;j++){
memcpy(&VGA[0],&sprite->data[i],16);
screen_offset+=320;
i+=16;
}
}
The goal is to convert that code to a specific assembly function to speed things just a bit.
(editor's note: this was the original asm attempt and text that an answer was based on. See the revision history to see what happened to this question. It was all removed in the last edit, making only the asker's own answer make sense, so this edit tries to make both answers make sense.)
I tried to write it in assembly with something like this, which I'm sure has huge mistakes:
void draw_sprite(SPRITE *sprite){
asm{
mov ax,0A000h
mov es,ax /* ES points to the video memory */
mov di,0 /* ES + DI = destination video memory */
mov si,[sprite.data]/* source memory ram ???*/
mov cx,16 /* bytes to copy */
rep movsb /* move 16 bytes from ds:si to es:di (I think this is the same as memcpy)*/
add di,320 /* next scanline in vram */
add si,16 /* next scanline of the sprite*/
mov cx,16
rep movsb /* memcpy */
/*etc*/
}
}
I know the ram address can't be stored in a 16 bit register because it is bigger than 64k, so mov si,[sprite.data]
is not going to work.
So How do I pass the ram address to the si register? (if it's possible).
I know I have to use ds and si registers to set something like a "bank" in "ds", and then, the "si" register can read a 64k chunk of the ram, (so that movsb can move ds:si to es:di). But I just don't know how it works.
I also wonder if that asm code would be faster than the c code (on an 8086 8 Mhz, or a 286), because you don't have to repeat the first part every loop.
I'm not copying from vram to vram for the moment, because I'd have to use the mode X and that's another story.