0

I'm going to have assembly testing and I have a question about assembly pointers. I'm trying to do an exercise but I can not solve it.

Consider the statements in C:

int x=100, y=200;
int far *ptx;
int far *pty;

assuming that the instructions have already been executed:

ptx=&x;
pty=(int *)malloc(sizeof(int));   

my question is on how to code the following points in assembly:

  1. ptx=pty
  2. *ptx=*pty
IanMoone
  • 181
  • 2
  • 9
  • 3
    Why `far`? I haven't seen this in the last 25 years. What is your platform? – Jabberwocky May 02 '19 at 14:13
  • which assembly? There is at least one assembly language for every platform (or more than one). – Serge May 02 '19 at 14:16
  • its for 8086... – IanMoone May 02 '19 at 14:33
  • @Jabberwocky It's common for 8 and 16 bit MCUs with more than 64kib memory mapped. However, those too are old technology nowadays. – Lundin May 02 '19 at 15:07
  • yeah, i know.... its very old technology :/ can you help me with the problem? i cant solve it :( – IanMoone May 02 '19 at 17:39
  • What flavor of assembly are you expected to use? – cHao May 02 '19 at 19:24
  • cpu 8086 assembly – IanMoone May 02 '19 at 19:25
  • Doesn't answer the question. There are at least three popular and mutually-not-quite-compatible dialects of 8086 assembly. The differences are especially significant when it comes to defining/using local variables. – cHao May 02 '19 at 19:26
  • sorry, but I know only that it is intel 8086 cpu assembly. We the dosbox with Turbo C ++ / Inline Assembler. – IanMoone May 02 '19 at 19:30
  • 1
    *\*sigh\** No wonder you're stuck with 8086. Turbo C++ is *effing ancient*. (But the choice of compiler suggests you're using a stripped-down variant of TASM.) – cHao May 02 '19 at 19:38
  • As worded this question reduces to "please do my homework for me". Can you improve it, so that we can feel more like you have something to contribute to your own learning? – Nate Eldredge May 02 '19 at 19:54
  • 3
    CS professors around the world need to learn that there are plenty of free, modern compilers out there. Students shouldn’t be introduced to segmented-memory models first: all modern OSes provide a flat, 32- or 64-bit address space. – Davislor May 02 '19 at 20:36
  • 1
    @Davislor: absolutely agreed. There are good IDEs like MARS for a toy MIPS system, with a simplistic standard library (read/write integers and strings) accessible via `syscall` instructions, much better than making students learn legacy DOS and BIOS system call interfaces that have basically no relevance to anyone in 2019, except for the few existing real-world uses of DOS like for validating new motherboard designs. Or if you want to teach x86, 32-bit or 64-bit is *easier* because of a flat memory model, orthogonal addressing modes, and generally less "this register is special" stuff. – Peter Cordes May 02 '19 at 23:04
  • I agree with you, Dosbox is old and there are some mistakes ... I'm sorry for the inconvenience, I really needed to know how to do this because it's going to be a test ... thanks – IanMoone May 03 '19 at 07:39

1 Answers1

5

Are those declarations supposed to be at global scope? If so, there will be asm labels on the static storage for the C variables. If not (locals inside a function), they'll be on the stack and IDK how they expect you to know what offset from BP they'll be at.

Either way, they're 32-bit seg:off (little-endian so offset in the low 16 bits) far pointers, so copying one to another is just a 4-byte copy you can do with 2 integer loads + stores.

Pointer variables (when they don't optimize away or into register) store the pointer value itself in memory, just like an int or long. In C when you do *pty, the compiler has to load the pointer value into registers, then do another load of the pointed-to memory.


I'm going to assume that DS refers to the data segment where the pointer values themselves are stored in memory. And that sizeof(int)=2, because that seems likely for a 16-bit C implementation.

To dereference and load the memory pointed-to by pty, i.e. *pty, you need to load the segment part of the part pointer into a segment register, and the offset part into SI,DI, or BX (registers that can be used as part of an addressing mode). x86 has instructions for that, like les / lds.

Since we probably don't want to modify DS, I'll just use ES. (Different assemblers use different syntax for segment overrides, like [es: di] for NASM but I think maybe es:[di] for TASM.)

;; *ptx = *pty
;; clobbers: ES, DI, and AX
; load *pty
    les  di, [pty]        ; load pty  from [DS:pty] into ES:DI
    mov  ax, es:[di]      ; load *pty into AX

; store *ptx
    les  di, [ptx]        ; load ptx  from [DS:ptx] into ES:DI
    stosw                 ; store to *ptx from AX

STOSW stores AX to ES:DI and increments or decrements DI according to the direction flag, DF. We don't care about the value of DI after this instruction runs, but the standard calling convention for Turbo C++ (and modern x86 conventions) says DF=0 (increment upward) on function entry/exit.

Use plain mov with another segment override if you haven't learned about string instructions yet.

(@MichaelPetch says DS is normally call-preserved in 16-bit real mode calling conventions, but that ES can be freely clobbered without saving/restoring it, so apparently I guessed right.)


Or if you can clobber DS and ES, you can use MOVSW. Using push/pop ds around this to save/restore would be more instructions. (But still smaller code-size)

;; assuming DS is correct for referencing static data like [pty]
    les  di, [pty]        ; load pty  from [DS:pty] into ES:DI
    lds  si, [ptx]        ; load ptx  from [DS:ptx] into DS:SI
    movsw                 ; copy a word from [DS:SI] to [ES:DI]

Note that I used lds second, because I'm assuming both globals in static storage are accessible through the incoming value of DS, not whatever segment value is part of the other far pointer.

If you had a "huge" or "large" memory model (or other model where not all static data is known to fit in one 64k segment), this would be more complicated, but your question didn't show anything about where ptx and pty are actually stored.


Also, I'm assuming you aren't supposed to optimize them away based on how they were recently assigned, even though the question shows you what they point to.

If you know ptx = &x, then you don't need to load ptx from memory, you can just mov [x], ax (again assuming a code model where static data like x is reachable via DS).

Also, it makes little sense to read from *pty when it's pointing at freshly-malloced storage, because that's uninitialized. The other way would make sense. I'm probably over-analyzing it.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I'm the upvote. The syntax is off for TASM/MASM (your comments suggest you are aware of that). The 16-bit real mode calling conventions (by default) makes DS non-volatile and ES volatile (some 16-bit compilers have an ovveride for that but it isn't standard). The direction flag is assumed to be cleared on entry, and has to be restored upon exit. – Michael Petch May 02 '19 at 20:59
  • The terminology should be either "Huge model" or "huge memory model". "Huge code model" might confuse people if they are reading older texts. In the huge model a single data item may be larger than 64kb (and you can have multiple code and data segments), it isn't whether all the data fits in a single data segment. The large model allows for multiple data (and code) segments but a single piece of data is no more than 64kb. – Michael Petch May 02 '19 at 22:01
  • 1
    The huge model also has a `huge` pointer type that is the default (different from `far`) where all pointers are normalized and the segment in the pointer can change. It is costly performance wise of course. – Michael Petch May 02 '19 at 22:19
  • 1
    @MichaelPetch: Thanks for the memory-model correction that "huge" has a specific technical meaning here, fixed. I didn't know the details, I was just reasoning by analogy from the x86-64 System V ABI's "huge" model where static data and even code can't be addressed with RIP+rel32 because there's too much static data to fit in one easily-addressable block. – Peter Cordes May 02 '19 at 22:56
  • 1
    Not sure if you wanted to see but this is the assembly generated for the global variables version: https://pastebin.com/tec2fPpm and the stack based one: https://pastebin.com/abfMbr25 . The function `test` does the `ptx=pty` and `test2` does `*ptx=*pty` . The way the question is worded I suspect the prof was looking at 1 and 2 as independent of each other. Both are compiled with the COMPACT model (COMPACT and LARGE are the two models where the default is FAR pointers). COMPACT is one code segment and multiple data segments. – Michael Petch May 02 '19 at 23:47
  • If you are in any of the models SMALL, MEDIUM, TINY `malloc` will be a version that returns 16-bit NEAR pointers. You need to explicitly call `farmalloc` to get a FAR pointer to allocated heap data. So I assume that the code in this question has to be at least e COMPACT or LARGE. – Michael Petch May 02 '19 at 23:51
  • 1
    As for DGROUP in the global variation version - that is a place holder for the segment that the global data is in (in compact and large models all global scope data go into the same section). `DGROUP:` will be fixed up by the DOS EXE loader at load time. – Michael Petch May 03 '19 at 00:01
  • 1
    @MichaelPetch: Thanks, interesting to see that a real compiler also chose `les`. And yeah, good point that if you did `ptx=pty` first, you could optimize away the `*ptx=*pty` :P I was also assuming they were independent. – Peter Cordes May 03 '19 at 00:01
  • No surprise: LDS and LES were heavily relied upon in most compilers. I only provided the output as you may have been interested as an FYI, nothing more. – Michael Petch May 03 '19 at 00:03
  • 1
    @MichaelPetch: I meant as opposed to `lds` or something. And yeah, other stuff like being able to assume that `ds` was set properly for access to static data on function entry, it seemed like a memory model with that property was the only way the question would be answerable without too much complication. – Peter Cordes May 03 '19 at 00:06
  • Thank you very much for the help, seriously – IanMoone May 03 '19 at 07:44