Is it necessary that each machine code can only map to one assembly code?

Question

Suppose these two are essensially the same:

push 1

and

0x1231

Which says each assembly instruction maps to a machine code.

But is it necessary that each machine code can only map to one assembly code?

score 4 · Answer 1 · answered Apr 07 '10 at 13:24

4

MIPS assembly language has several "pseudoinstructions". For example, "move" is internally just an "add" with an implicit $0 operand.

answered Apr 07 '10 at 13:24

dan04

87,747
23
163
198

score 3 · Answer 2 · answered Apr 07 '10 at 01:28

3

You could perfectly well define an assembler program that supports "synonyms" for instructions: no harm is done if you let the user code FOO meaning exactly the same as BAR. I don't know offhand of assemblers that do that, but you can certainly achieve the same effect with a trivially simple macro in any macro-assembler;-).

answered Apr 07 '10 at 01:28

Alex Martelli

854,459
170
1,222
1,395

how does the processor segment machine code,by two words per instruction? – Mask Apr 07 '10 at 01:48
The binary format of machine code, and the syntax of the assembly language which generates that machine code, are quite uncorrelated. x86, for example, has binary instructions of widely varying lengths, from one byte on up, but each is generated from a single assembly language instruction. – Alex Martelli Apr 07 '10 at 02:01
In the view of processor,all are a sequence of bits.How does it know the starting and finishing bits for each instruction? – Mask Apr 07 '10 at 02:02
@Mask, all modern processors use sequences of words (possibly including bytes), not bits. Those with instructions of different lengths obviously have some extra logic such that, depending on the first byte or word, they know how many more they'll need. Again, the assembler (whose job is to read assembly code text and generate binary machine code) has nothing to do with the case. – Alex Martelli Apr 07 '10 at 02:40
1

Yes, assemblers actually do this. Almost every x86 assembler does this for `je` and `jz`.. they mean the exact same thing, but sometimes one of the two is a bit easier to understand for a programmer. – Earlz May 25 '11 at 18:10

Andras Vass · Answer 3 · 2010-04-07T14:54:22.070

2

Even without synonyms, an assembly instruction can map to more than one machine codes.
E.g. add eax, ebx can be represented as either 03 C3 or 01 D8.
In fact, this can be useful, e.g. to identify particular compilers.
You can find more examples in this article.

The reverse can also be true, in a way.
The example is a bit far-fetched, but the same machine code (F3 90) maps to either REP NOP or PAUSE on x86.
Which one is executed, depends on the CPU the code runs on.
Although the same opcode was chosen deliberately and as far as the processor state is concerned, they make no difference, the execution time - and the exact internal implementation - can differ on a HT (PAUSE) vs non-HT (NOP) CPU.

Apart from the PAUSE vs REP NOP that makes little difference, it is possible to write machine code that is hard to disassemble it statically.
E.g. one can carefully construct a machine code sequence that results in completely different assembly instructions if the disassembly starts at say offset 0 vs offset 1.
One can also write self-modifying assembly code to make static analysis harder.

edited Apr 07 '10 at 14:54

answered Apr 07 '10 at 01:31

Andras Vass

11,478
1
37
49

BTW,how does the processor segment machine code,by two words per instruction? – Mask Apr 07 '10 at 01:42
@Mask: If your question is whether there are one byte instructions in machine code, then yes, there are many of them. – Andras Vass Apr 07 '10 at 01:54
No.In the view of processor,all are a sequence of bits.How does it know the starting and finishing bits for each instruction? – Mask Apr 07 '10 at 01:57
@Mask: Opcodes can encode the necessary information about the arguments and their lengths. An opcode table: http://www.sandpile.org/ia32/opc_1.htm An article about the redundancy of machine code: http://www.strchr.com/machine_code_redundancy – Andras Vass Apr 07 '10 at 02:08
@Mask: though a much better - albeit longer and more sophisticated - source is "Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2A: Instruction Set Reference, A-M", in which a whole chapter is dedicated to the instruction and opcode format. http://www.intel.com/Assets/PDF/manual/253666.pdf – Andras Vass Apr 07 '10 at 02:17
Interesting tidbit: I seem to recall that the old A86 assembler used a pattern of selecting various machine code synonyms to put a fingerprint on the assembled output. That way, the A86 developer was able to find out who was pirating his assembler... – Brian Knoblauch Aug 17 '11 at 14:45
@BrianKnoblauch: It's worth noting that some chip vendors explicitly forbid the use of bit patterns whose behavior would be equivalent to another instruction form, and that doing so often eases future chip designs. For example, even if an 8088 would regard as equivalent a "mov ax,bx" encoded as "mov reg,ea" where reg=AX and ea=BX, or "mov ea,reg" where ea=AX and reg=BX, requiring that register-to-register moves always use one particular form would free up eight two-byte opcodes for some other purpose. – supercat Jan 07 '14 at 16:48

score 2 · Answer 4 · answered Apr 07 '10 at 01:50

Yes. A real-world example of this is 68k assembler, where

The official mnemonics BCC (branch on carry clear) and BCS (branch on carry set) can be renamed as BHS (branch on higher than or same) and BLO (branch on less than), respectively. Many 68000 assemblers support these alternative mnemonics.

score 0 · Answer 5 · answered Apr 07 '10 at 01:29

I don't see any conceptual reason why you couldn't design an assembly language wherein more than one assembly statement map to the same opcode on the underlying processor.

I also don't immediately see any particularly good reason to do that, but it's late and maybe I'm missing something.

score 0 · Answer 6 · answered Apr 07 '10 at 01:33

What a particular machine code instruction does is dictated by the processor (or processor family) it is for. And the same machine code instruction will always do fundamentally the same thing.

Normally, a particular machine code instruction will dis-assemble to only one statement. In some more complex instruction sets, there are several ways to write the same expression in assembler. A good example is indexed lookups. Some statements can also have synonyms but again, will still mean the same thing to the processor.

However, it is possible for multiple whole assembly sets to exist for an architecture. This has happened for the x86 architecture where there is the standard set as defined by Intel, and then there's another based on one created by AT&T, which his is the one used by GCC.

score -2 · Answer 7 · answered Apr 07 '10 at 01:32

Generally the point of assembly is to allow you to directly program the machine without an ambiguity on what will be executed. The pretty much requires a 1:1 mapping.

I wouldn't be surprised if somewhere in some assembler there are some indirect mappings probably to deal with changes to opcodes in some line of processors. I don't know of any though.

Is it necessary that each machine code can only map to one assembly code?

7 Answers7

Linked