-2

So I have a question. Currently I am looking into assembly. I don't really want to code in it but it'll be useful for some of my C/C++ projects I'm working at.

I am looking at this page: http://ref.x86asm.net/coder32.html and it seems like there are a lot of different variations for instructions.

Like:

mov eax, eax

is different from

mov 0xaddress, 0xaddress

why doesn't the computer just interpret these itself based on the "arguments"?

Jester
  • 56,577
  • 4
  • 81
  • 125
SoLux
  • 162
  • 1
  • 9
  • The second isn't even valid. Other than that, the assembler will choose the appropriate encoding for you, that's what it's for. The cpu needs to be told the operand types. – Jester Feb 03 '18 at 20:15
  • yeah, my bad, i just wanted to show that they can vary, guess thats wrong. but my question is why does the cpu need to be told that and cant just do that itself? you know, just want a bit of info on the internals. any links or anything would help – SoLux Feb 03 '18 at 20:17
  • There are no arguments (like in high level language). The CPU is hardwired to know only certain set of instructions (with certain "arguments" from fixed options), and those are hardwired to make them performant. Actually modern x86 processors are sort of interpreters anyway, but back in 8086 times the HW transistors for each operation code did exactly what was designed by hand, and nobody bothered to slow it down by some kind of clever interpreter. – Ped7g Feb 03 '18 at 20:17
  • @Ped7g thanks man, this is what I was asking really. do any of you have some links to some books/websites with more info on the interals of a computer/asm and such that i could use? – SoLux Feb 03 '18 at 20:19
  • No, but I do have an answer. Really it's Jester and Ped7gs, but an answer. @Charles –  Feb 03 '18 at 20:20
  • 2
    So the whole point of assembly is to translate directly 1:1 into machine code instruction, which have only the available fixed options defined by the CPU manufacturer, nothing more. Some assemblers support some sort of compilation (non 1:1 mapping), like having pseudo instructions (for MIPS CPU assemblers a norm), or macros, but in x86 world usually whatever you write in ASM source will assemble directly to single exact x86 instruction, so you have full control over the resulting machine code. – Ped7g Feb 03 '18 at 20:21
  • Your just trying to get me to paste some stuff i wont even read @Ped7g –  Feb 03 '18 at 20:21
  • See also the _"CHAPTER 2 INSTRUCTION FORMAT"_ in the official intel instruction set reference. – Jester Feb 03 '18 at 20:23
  • @TheOneWhoMade feel free... (some review+formatting?!) OP: questions about books/etc are offtopic, but I think you should search the SO a bit more, there are probably some old questions about how 8086 was designed, etc... starting from early era may chew considerable amount of time, but those designs were still quite human and understandable. Learning about modern x86 like i7 is more problematic, the resources are very scarce (it's Intel's commercial know-how, how it really operates), still the performance-tuning knowledge like Agner's Fog tables/etc reveal great deal of internal architecture. – Ped7g Feb 03 '18 at 20:24
  • Intel's earlier processor, the 8080, had different instruction names for different types of moves, like MOV, MVI, LDA, LXI, STA, STAX, etc, etc. Rather inconvenient, especially when the competing Z80 used LD (load) for everything. So for the 8086 Intel decided to call the data transfer instructions MOV, and there are lots of them. Different encodings, but using the same name. – Bo Persson Feb 04 '18 at 00:54

1 Answers1

1

The cpu DOES interpret these based on arguments particularly in x86. There is a byte, an operand and as your document and certainly a number of better documents show the table of operands and what they map to. Based on the operand it may need another byte (or more) but as it parse each byte after the initial operand the picture becomes clearer as to what the cpu is supposed to do, this is a mov but what kind, okay it is a register register, okay what two registers, ending up with some number of bits indicating each of the registers. This is a mov, okay what kind, register, immediate and then more operands indicating the register and the immediate.

But we dont think of these as a function with operands. So we dont normally write assembly language in this way. Instead the assembly language is designed to narrow in on the specific machine language bits for a very specific instruction

mov ax,1234h
mov ah,byte ptr[ax]

If I remember right intel may actually have dont care bits or other features where you can implement something two different ways that are valid (see the a86 documentation or is it as86?)

Intel did/does have an instruction set (not x86 or related) that did have a function based assembly language, and I wont say what it was but I just searched and there are legal and/or illegal class lectures, etc that show some of this syntax.

alu[x,--,y,+,z]

Likely intentionally resembles a function this is saying x = y + z; I think you can use parens instead of brackets and the macro language looks like functions as well

  mymacro(w,x,y,z)

where those items can be used in the macro however you want, like a define in C the macro just shoves the text as is so you have to make sure it conforms.

  alu[w,--,x,+,y]
  alu[w,--,w,and,z]

I have seen someone use C as an assembler and that was very very cool, super easy to implement the assembler as the C compiler took care of all of the parsing

add(r0,r1);
sub(r2,r3);

and you link/include the backend to your main "program"

#define r0 0
#define r1 1
#define r2 2
#define r3 3

void add ( unsigned int a, unsigned int b)
{
   emit(0x4140|(ra<<3)|rb);
}

I dont remember how conditionals worked I think there was something like

label("hello");
...
bne("hello");

and forward references were patched up at the end...

could bang out an assembler in record time with an approach like that...but most folks have probably never seen this nor tried to use nor implement it.

Some processors are fixed length instructions and some are variable, variable is more like an opcode and the from that you figure out how many more opcodes then operands. fixed length still has operand fields/bits but then based on that interpret the rest of the bits, very easy to implement a decoder that settles in a single clock cycle, not that you couldnt have a wide shift register for x86 that settled in one clock too, but historically that was not how it worked and CISC leans toward microcoding.

an assembly language is defined by the assembler, the program that parses it. it is up to the author(s) of that assembler as to the syntax, so long as you conform to the machine language you can make any syntax you can dream up, there is no reason whatsoever you cant make an x86 assembler that is function like with operands.

  addrr(ax,bx);  //mov ax,bx
  movbptr(ah,bx); //mov ah,byte ptr bx

so long as you can implement enough instructions to be useful, ideally the entire instruction set. and ideally each line creates a one specific instruction. The problem you would have at this point in history is finding a user/consumer of this tool. Hardly anyone codes purely assembly language, so the primary consumers are compilers, and those already have an assembly language they use as an output, one they didnt have to write (an existing one like gas).

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - [From Review](/review/low-quality-posts/18714694) – vkrams Feb 03 '18 at 22:28
  • @vkrams I just finished re-writing from scratch, please read new answer. – old_timer Feb 03 '18 at 22:40
  • I wonder if that "C as assembler" is what inspired [HLA (High Level Assembly)](https://en.wikipedia.org/wiki/High_Level_Assembly), which also looks like that. e.g. https://stackoverflow.com/questions/38820629/hla-assembly-recursive-fibonacci-program – Peter Cordes Feb 03 '18 at 23:27
  • 1
    re: two ways of assembling the same instruction: Yes, most basic x86 integer instructions (like [`add`](https://github.com/HJLebbink/asm-dude/wiki/ADD)) have at least two opcodes, and for reg,reg operands you can use either opcode: either `add r/m32, r32` or `add r32, r/m32`. If either the source or destination is memory, then only one opcode works, but if both are registers, then you can use either opcode. See https://stackoverflow.com/questions/2760794/x86-cmp-instruction-difference for an example with `cmp`. – Peter Cordes Feb 04 '18 at 03:31