When developing my assembler I got one problem. In assembly language we can define data values (e. g. msg db 'Hi'
) and paste address of this data anywhere, even above this data. However when assembling code assembler don't know address of data until is not processed the line with those data.
Of course, assembler can remember addresses of machine code where we use address of defined data and after processing a code replace values in remembered addresses to address of our data. But if we define data in 2-byted address (e. g. at 01 F1) assembler would be to replace 1 bytes in remembered addresses to 2 bytes of address (01 F1) and so immediate field size will be changed (imm8 -> imm16) and assembler shall be rewrite instruction in same address (change bits w and s at opcode and maybe set prefix 0x66). If assembler will set prefix 0x66 and our data definition be after this instruction it shall be rewrite immediate field bytes (increment address value).
Illustration of this algoritm :
The following code:
mov dh, 09h
add dx, msg
;...
msg db 'Hello$'
will be assembled in the following principle:
- preparing the code:
Comment : |===> Remember address of this byte (0x0004)
Comment : | ADD DX,MSG |
Address : 0000 0001 |0002 0003 0004| ... 01F1 01F2 01F3 01F4 01F5 01F6
Code : B4 09 | 83 C2 00 | ... 48 65 6C 6C 6F 24
Comment : ---------------- H e l l o $
- rewriting code in remebered addresses:
Comment : |=============|-This address (msg)
Comment : | ADD DX,01F1 | v
Address : 0000 0001 |0002 0003 0004 0005| ... 01F2 01F3 01F4 01F5 01F6 01F7
Code : B4 09 | 83 C2 F1 01 | ... 48 65 6C 6C 6F 24
Comment : --------------------- H e l l o $
- rewriting instruction's opcode
83h -> 81h
(10000011b -> 10000001b
: bits=0
):
Comment : |=============|-This address (msg)
Comment : | ADD DX,01F1 | v
Address : 0000 0001 |0002 0003 0004 0005| ... 01F2 01F3 01F4 01F5 01F6 01F7
Code : B4 09 | 81 C2 F1 01 | ... 48 65 6C 6C 6F 24
Comment : --------------------- H e l l o $
- write to immediate field the new address of data (
0x01F2
):
Comment : |=============|-This address (msg)
Comment : | ADD DX,01F2 | v
Address : 0000 0001 |0002 0003 0004 0005| ... 01F2 01F3 01F4 01F5 01F6 01F7
Code : B4 09 | 81 C2 F2 01 | ... 48 65 6C 6C 6F 24
Comment : --------------------- H e l l o $
I think that this algorithm is difficult. Is it possible to simplify its?