After asking about the relation between assembly and machine code, I am beginning to read through the Intel 64 instruction set reference.
There is still a lot to learn here, but after looking through the first two chapters (need to study chapter 2 much more), I don't feel any closer to understanding what the machine code means yet. Maybe after reading all 1300+ pages, and the Art of Assembly, and perhaps a CS architecture course, how this applies in practice will start to make sense.
But in the mean time, can you explain why the numbers in a compiled assembly file (or any "binary" I guess is what you'd call it, which is just machine code in my understanding) is organized into a grid of 8 columns with 4 hexidecimal numbers each? This may be obvious to you but I have no idea if it means anything or not.
cffa edfe 0700 0001 0300 0000 0100 0000
0200 0000 0001 0000 0000 0000 0000 0000
1900 0000 e800 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
2e00 0000 0000 0000 2001 0000 0000 0000
2e00 0000 0000 0000 0700 0000 0700 0000
0200 0000 0000 0000 5f5f 7465 7874 0000
0000 0000 0000 0000 5f5f 5445 5854 0000
0000 0000 0000 0000 0000 0000 0000 0000
2000 0000 0000 0000 2001 0000 0000 0000
5001 0000 0100 0000 0005 0080 0000 0000
0000 0000 0000 0000 5f5f 6461 7461 0000
0000 0000 0000 0000 5f5f 4441 5441 0000
0000 0000 0000 0000 2000 0000 0000 0000
0e00 0000 0000 0000 4001 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0200 0000 1800 0000
5801 0000 0400 0000 9801 0000 1c00 0000
e800 0000 00b8 0400 0002 bf01 0000 0048
be00 0000 0000 0000 00ba 0e00 0000 0f05
4865 6c6c 6f2c 2077 6f72 6c64 210a 0000
1100 0000 0100 000e 0700 0000 0e01 0000
0500 0000 0000 0000 0d00 0000 0e02 0000
2000 0000 0000 0000 1500 0000 0200 0000
0e00 0000 0000 0000 0100 0000 0f01 0000
0000 0000 0000 0000 0073 7461 7274 0077
7269 7465 006d 6573 7361 6765 006c 656e
6774 6800
More specifically...
As pointed out in the selected answer in the other question about the relation between assembly and machine code, all the information is at least somewhere in the Intel docs. For example, at the beginning of Chapter 2, they say these things:
- LOCK prefix is encoded using F0H.
- REPNE/REPNZ prefix is encoded using F2H...
The LOCK prefix (F0H) forces an operation that ensures exclusive use of shared memory in a multiprocessor environment... Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a string...
I understand that by F0H
, they really just mean "f0
which is a hexidecimal number in case that isn't clear". So then you can find that number a couple of times in the machine code above. For example, near the bottom in the 6th column is bf01
.
Without knowing much more than this, I am trying to put together the very specific (but not very practical) intel docs with some actual machine code, so I can start to really "get" how the intel docs are actually applied.
As a first step in that process of understanding, I am wondering this:
- Is the
f0
in thatbf01
the same thing that the intel docs are describing? That is, is it the LOCK prefixF0H
? Or if not, how do you know that? - Why are the numbers in a grid of 8 columns of 4 numbers each?
- If
f0
in thebf01
chunk does mean that LOCK prefix, why is it starting at an odd position (that is, it's not starting at an even position like position 0 or 2 in a column)? This is the main reason for this whole question. If it can appear at an odd position, then is breaking them into 8 columns of 4 numbers each just arbitrary (i.e. just makes it look pretty), because if all opcodes are at least 2 characters, then it would never appear at an odd position.