Difference between: Opcode, byte code, mnemonics, machine code and assembly

Question

I am quite new to this. I tried to understand the difference between the mentioned terms in a clear fashion, however, I am still confused. Here is what I have found:

In computer assembler (or assembly) language, a mnemonic is an abbreviation for an operation. It's entered in the operation code field of each assembler program instruction. for example AND AC,37 which means AND the AC register with 37. so AND, SUB and MUL are mnemonic. They are get translated by the assembler.
Instructions (statements) in assembly language are generally very simple, unlike those in high-level programming languages. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode, plus zero or more operands.

score 50 · Accepted Answer · answered Jul 14 '13 at 11:21

50

OPCODE: It is a number interpreted by your machine(virtual or silicon) that represents the operation to perform

BYTECODE: Same as machine code, except, its mostly used by a software based interpreter(like Java or CLR)

MNEMONIC: English word MNEMONIC means "A device such as a pattern of letters, ideas, or associations that assists in remembering something.". So, its usually used by assembly language programmers to remember the "OPERATIONS" a machine can do, like "ADD" and "MUL" and "MOV" etc. This is assembler specific.

MACHINE CODE: It is the sequence of numbers that flip the switches in the computer on and off to perform a certain job of work - such as addition of numbers, branching, multiplication, etc etc. This is purely machine specific and well documented by the implementers of the processor.

Assembly: There are two "assemblies" - one assembly program is a sequence of mnemonics and operands that are fed to an "assembler" which "assembles" the mnemonics and operands into executable machine code. Optionally a "linker" links the assemblies and produces an executable file.

the second "assembly" in "CLR" based languages(.NET languages) is a sequence of CLR code infused with metadata information, sort of a library of executable code, but not directly executable.

answered Jul 14 '13 at 11:21

Aniket Inge

25,375
5
50
78

Java does not use "bytecode". (Not officially, that is.) It's just "code". – Hot Licks Jul 14 '13 at 11:44
@HotLicks Not strictly correct; document http://docs.oracle.com/javase/specs/jvms/se5.0/html/VMSpecTOC.doc.html makes reference to and describes the Java "bytecode verifier" for example. You will typically find terms such as bytecode, virtual machine code (both used in aforementioned document) and others used interchangeably as they essentially mean the same thing. – Nick Jul 14 '13 at 14:38
1

Hmmm... I see that "The Bytecode Verifier" is indeed used as a section heading, and at least twice in that section. But "bytecode" does not appear in the index, and the term is not used in, eg, the section on class file format. Curiously inconsistent. – Hot Licks Jul 14 '13 at 18:55
@Aniket I was looking for something that states the progression and the relation between these terms. Anyway, thanks for the help really ! – Ahmed Taher Jul 14 '13 at 21:35
@AhmedTaher try adding this requirement to the question. Also, I think that to present an answer in that way will be harder unless you skip the 'byte code' from your question, since, as I know, it is a younger concept than the rest. – n611x007 Nov 08 '13 at 22:10
I think the word "assembly" in [tag:.NET-assembly] is derived from the English-language meaning of the word (a collection of things, alternatively an assemblage). I think it's essentially unrelated to the "assembly language" meaning of the word (a text language for assembling bytes or bits into an output file / stream, usually using executable instructions). – Peter Cordes Jan 29 '18 at 16:53

Nick · Answer 2 · 2018-01-29T16:16:57.860

Aniket did a good job, but I'll have a go too.

First, understand that at the lowest level, computer programs and all data are just numbers (sometimes called words), in memory of some kind. Most commonly these words are multiples of 8 bits (1's and 0's) (such as 32 and 64) but not necessarily, and in some processors each word is considerably larger. Regardless though, it's just numbers that are represented as a series of 1's and 0's, or on's and off's if you like. What the numbers mean is up to what/who-ever is reading them, and in the processor's case, it reads memory one word at a time, and based on the number (instruction) it sees, takes some action. Such actions might include reading a value from memory, writing a value to memory, modifying a value it had read, jumping to somewhere else in memory to read instructions from.

In the very early days a programmer would literally flick switches on and off to make changes to memory, with lights on or off to read out the 1's and 0's, as there were no keyboards, screens and so on. As time progressed, memory got larger, processors became more complex, display devices and keyboards for input were conceived, and with that, easier ways to program.

Paraphrasing Aniket:

The OPCODE is part of an instruction word that is interpreted by the processor as representing the operation to perform, such as read, write, jump, add. Many instructions will also have OPERANDS that affect how the instruction performs, such as saying from where in memory to read or write, or where to jump to. So if instructions are 32 bits in size for example, a processor may use 8 bits for the opcode, and 12 bits for each of two operands.

A step up from toggling switches, code might be entered into a machine using a program called a "monitor". The programmer would use simple commands to say what memory they want to modify, and enter MACHINE CODE numerically, e.g. in base 16 (hex) using 0 to 9 and A to F for digits.

Though better than toggling switches, entering machine code is still slow and error prone. A step up from that is ASSEMBLY CODE, which uses more easily remembered MNEMONICS in place of the actual number that represents an instruction. The job of the ASSEMBLER is primarily to transform the mnemonic form of the program to the corresponding machine code. This makes programming easier, particularly for jump instructions, where part of the instruction is a memory address to jump to or a number of words to skip. Programming in machine code requires painstaking calculations to formulate the correct instruction, and if some code is added or removed, jump instructions may need to be recalculated. The assembler handles this for the programmer.

This leaves BYTECODE, which is fundamentally the same as machine code, in that it describes low level operations such as reading and writing memory, and basic calculations. Bytecode is typically conceived to be produced when COMPILING a higher level language, for example PHP or Java, and unlike machine code for many hardware based processors, may have operations to support specific features of the higher level language. A key difference is that the processor of bytecode is usually a program, though processors have been created for interpreting some bytecode specifications, e.g. a processor called SOAR (Smalltalk On A RISC) for Smalltalk bytecode. While you wouldn't typically call native machine code bytecode, for some types of processors such as CISC and EISC (e.g. Linn Rekursiv, from the people who made record players), the processor itself contains a program that is interpreting the machine instructions, so there are parallels.

Very Elegant..... I was looking for something like this that ties the parts together forming a clear picture ! — Ahmed Taher, Jul 14 '13 at 21:28
I am studying shellcoding now and found [this](http://redmine.corelan.be/projects/shellcoding/repository/revisions/2/entry/scripts/pveWritebin.pl) script that converts the shellcode -which is per my understanding now is hex representation of machine code that do a specific operation- to binary file "as claimed". this binary file is a text file actually ! How does the word binary fits in the context here ! — Ahmed Taher, Jul 14 '13 at 22:55
Strictly speaking the phrase "binary file" is meaningless and inaccurate, particular here it seems. It's commonly used when referring to a file having contents that cannot be meaningfully interpreted by a human, e.g. not using a character set such as ASCII. So for example, a pdf or word document would be said to be in a binary format as when viewed we could not interpret the contents, whereas a .txt file would be said to be a text file, as each byte in the file directly represents the contents. An "executable binary" would be a file where the contents represents a program. — Nick, Jul 14 '13 at 23:34
@AhmedTahler "Very Elegant..... I was looking for something like this that ties the parts together forming a clear picture" Thanks; don't forget to upvote ;) — Nick, Jul 14 '13 at 23:35
Could you expand 'sometimes' in 'sometimes called words' and add actual microprocessor (or whatever applies) examples, and a few counter-examples, to 'most commonly'? — n611x007, Nov 08 '13 at 22:20
@naxa "word" tends to mean the natural unit of memory that a processor accesses. It's common for words to be multiples of 8 bits, e.g. 32 or 64 bits, but so called VLIW (Very Large Instruction Word) processors with much wider words, e.g. 1024 bits, have been created for parallel fetch and execution of independent instructions. Not all processors use multiples of 8 bits though, such as microcontrollers from microchip.com with 12 and 14 bit instruction words. — Nick, Mar 14 '14 at 19:33
Excellent answer.. I do have a question on opcode. So the opcode is just a way to group operations and operands in long running 1's and 0's? — ns15, Apr 10 '21 at 09:11
@shadow0359 Thanks. It's more that the opcode and operands together are a group that make up the complete instruction. The operands are meaningless without the opcode, and the opcode needs the operands to have data to operate on. The opcode might itself have a few fields, such as a group bits to identify the type of instruction, e.g. jump relative, jump absolute, add, subtract etc., a set of bits identifying the variant of that instruction, and perhaps bits for some other purpose. — Nick, Apr 11 '21 at 15:46
at what level does all this stuff live? Machine code depends on the processor, but are OPCODEs or BYTECODEs more portable? Are they dependent on an OS? — Rad80, Aug 26 '22 at 08:19
@Rad80 An opcode is the part of an instruction that signifies what operation to perform. Instructions at any level (e.g. microcode, machine code, bytecode) will all have an opcode. Bytecode is usually at a level above machine code, and can have a portability advantage over machine code; a program compiled for an x86 CPU won't run natively on an arm cpu, however a program compiled to bytecode could run on either as long as the bytecode interpreter behaves the same in both cases. Bytecode itself would not be dependent on OS, but the interpreter might have limited OS support, e.g. Unix only. — Nick, Aug 27 '22 at 17:56

score 18 · Answer 3 · answered Jan 01 '17 at 10:07

18

The following line is a disassembled x86 code.

68 73 9D 00 01       PUSH 0x01009D73

68 is the opcode. With the following for bytes it represents PUSH instruction of x86 Assembly language. PUSH instruction pushes 4 bytes (32 bits) length data to stack. The word PUSH is just a mnemonic that represents opcode 68. Each of bytes 68, 73, 9D, 00, 01 is machine code.

machine codes are for real machines (CPUs) but byte codes are pseudo machine codes for virtual machines.

When you write a java code. java compiler compiles your code and generates byte codes. (A .class file) and you can execute the same code at any platform without changing.

                     JAVA CODE
                         |
                         |
                     BYTE CODE
         ________________|_______________
         |               |               |
      x86 JVM        SPARC JVM        ARM JVM
         |               |               |
         |               |               |
        x86            SPARC            ARM
   MACHINE CODE     MACHINE CODE    MACHINE CODE

answered Jan 01 '17 at 10:07

Fırat Küçük

5,613
2
50
53

2

Thanks for this -- But why does Java need to generate bytecode? Why can't it just create the machine code (depending on which machine it's working with) and skip the intermediary steps? – Moondra Jul 19 '17 at 19:07
2

@Moondra because of the mobility. Java's main motto is "Write once run everywhere" so you can run the same executable in every platform. Otherwise you must compile the code according to CPU or OS. – Fırat Küçük Jul 19 '17 at 19:38
1

@FıratKÜÇÜK I see thank you. So, I'm assuming some thing C++ which gets compiled directly to machine code has to have many clauses for different OSes and CPUs? – Moondra Jul 19 '17 at 23:34
1

@Moondra yes platform spesific compilations must have different executables for every single platform. – Fırat Küçük Jul 20 '17 at 14:14
`68` is only one of the opcodes that x86 `push` can assemble to. That's `push imm32`. There are other opcodes for `push register` (with the register number as part of the opcode byte), and for `push imm8`, and `push r/m` (addressing via a modr/m byte). – Peter Cordes Jan 29 '18 at 16:58

Hot Licks · Answer 4 · 2013-07-14T12:16:34.393

2

"Assembly" originates from the very early code "assemblers" which would "assemble" programs from multiple files (what we would now call "include" files). (Though note the "files" were often card decks.) The use of the term "assembly language" to refer to a mnemonic representation of the code is a back-formation from "assembler", and somewhat imprecise, since a number of "assemblers" do not support include files and hence do not "assemble".

It's interesting to note that "assemblers" were invented to support "subroutines". Originally there were "internal" and "external" subroutines. "Internal" subroutines were what we would now call "inline", whereas "external" ones were reached via a primitive "call" mechanism. There was much controversy at the time as to whether "external" subroutines were a good idea or not.

"Mnemonic" comes from the name of the Greek god Mnemosyne, the goddess of memory. Anything that helps you remember stuff is a "mnemonic device".

edited Jul 14 '13 at 12:16

answered Jul 14 '13 at 11:55

Hot Licks

47,103
17
93
151

1

As a non-native it always confused me that they say 'mnemonic device' - because the word 'device' is being used the most in connection with computers, as far as a non-English foreigner may perceive. My understanding today is that this choice of words is just an unfortunate coincidence, and 'mnemonic device' has *nothing* to do with computers. 'device' is only used in a general sense, meaning something like 'a tool that enables something or makes it easier to do'. – n611x007 Nov 08 '13 at 22:15
@naxa - The term "mnemonic device", in English, goes back at least 40 years, and likely 100 or more. Nothing "unfortunate" about it -- English is teeming with ambiguous words and phrases, and, though it do doubt makes it difficult for the non-native speaker, it makes the language much richer and more poetic. – Hot Licks Nov 08 '13 at 22:58
1

Unfortunate for one who lets confusion take over without a fight. The language is beautiful. – n611x007 Nov 09 '13 at 01:13
Oops - make that "... no doubt ..." – Hot Licks Nov 09 '13 at 03:30

score 1 · Answer 5 · edited Jun 20 '20 at 09:12

Recently I read a good article on this, Difference between Opcode and Bytecode, thus like to share with whoever is after a good explanation on this topic. All the credit goes to the original author.

Opcode:

Opcode is short for operation code. As its name suggests, the opcode is a type of code that tells the machine what to do, i.e. what operation to perform. Opcode is a type of machine language instruction.
Bytecode:

Bytecode is similar to opcode in nature, as it also tells the machine what to do. However, bytecode is not designed to be executed by the processor directly, but rather by another program.
It is most commonly used by a software based interpreter like Java or CLR. They convert each generalized machine instruction into a specific machine instruction or instructions so that the computer's processor will understand.
In fact, the name bytecode comes from instruction sets that have one-byte opcodes followed by optional parameters.

*Opcode is a type of machine language instruction.*. No, the opcode is *part of* every machine instruction. Some instructions have just an opcode, some have more bits / bytes to specify operands. — Peter Cordes, Jan 29 '18 at 17:01

score -3 · Answer 6 · edited Apr 02 '15 at 14:18

-3

Machine code is in binary but mnemonic is in ideas, letters (MOV, ADD, etc)
Machine code is language but mnemonic code is a part of assembly language

edited Apr 02 '15 at 14:18

legoscia

39,593
22
116
167

answered Apr 02 '15 at 12:09

Arindam bera

1

Difference between: Opcode, byte code, mnemonics, machine code and assembly

6 Answers6

Linked