I have read that many times that a compiler translate a high level code into machine language and whenever I google "machine language" it tells me it is the assembly language. On the other hand when I opened a hello world app written in c++ with notepad it showed me something which was anything but assembly. What actually is the secret behind this stuff and also where did the binary and bits come in? Please solve my confusion.
-
910001111101011101001111011. – DeiDei Apr 15 '17 at 11:04
-
If you were expecting to see 1s and 0s in Notepad, know that this would be 1/8th as efficient. You'd have one byte per 0 or 1 instead of one bit. You could certainly make something that displays binary bits just like Sublime has a hexadecimal option. – chris Apr 15 '17 at 11:12
-
oh god.. we went this long way that this became a "secret".. that's where usually education of about compilers and computer science was beginning in past. – Swift - Friday Pie Apr 15 '17 at 11:58
-
Actually if you truly interested, there is nice book that explains theory behind the topic: System Software: An Introduction to Systems Programming by Leland L.Beck. original edition was issued in 80s but it contained systematization of EVERY concept used in modern software developing. – Swift - Friday Pie Apr 15 '17 at 12:03
-
@Swift Where can I find it man? – S.Saad Apr 15 '17 at 12:07
-
@S.Saad there are digital versions around the net, I saw one at amazon, I think, use google search :P – Swift - Friday Pie Apr 15 '17 at 13:30
-
The reality is, you rarely want to know what assembler statements translate to in machine code - a binary encoding - even though there is a direct correspondence. You can consult manuals for the byte encoding of various instruction sets. Some are relatively simple (like ARM), and some are complex, variable length encodings (like x86[-64]). – Brett Hale Apr 15 '17 at 14:01
4 Answers
At the lowest level, machine language has no human-readable syntax. A program is a sequence of numbers arranged in such a way that, when interpreted by CPU, invokes a sequence of instructions requested by the algorithm of the program.
Assembly language is a human-readable representation of the machine language. CPU cannot interpret assembly language directly, so a translation step is needed to go between the two representations. You can run a disassembler program on an executable to see its instructions represented as assembly language mnemonics.
This is somewhat similar to strings, which are strings to humans, but to computers they are simply sequences of numbers. For example, when you write "ABC"
, computer sees a sequence of numbers 65, 66, 67. It takes an editor program to go between numeric representation (numbers) and human-readable representation (letters).
Similarly, a sequence of instructions
AND #0F
OR #30
would look like 41, 15, 09, 48 in machine code of a simple 8-bit CPU. Translator from the assembly language would turn the above text into four numbers; disassembler would turn four numbers back into the human-readable text.

- 714,442
- 84
- 1,110
- 1,523
-
-
1The first sentence is not universally true. Many machine's machine code is very well readable. For example, PDP-11 machine code is an easy read when presented in octal. – fuz Apr 15 '17 at 12:05
-
1@fuz Although PDP-11's code is easily decodable due to its brilliantly designed structure, I wouldn't go as far as calling its octal dump a "human-readable syntax". I could decode parts of my binary without a disassembler, but I would not call that process "reading". – Sergey Kalinichenko Apr 15 '17 at 12:19
-
@dasblinkenlight As a matter of fact, I know at least one person who programs his PDP-11 in octal without an assembler. Perfectly readable and writable. – fuz Apr 15 '17 at 12:38
-
@dasblinkenlight Perhaps change “machine language has no human-readable syntax” to “machine language is typically not human-readable.” – fuz Apr 15 '17 at 12:39
-
1@dasblinkenlight I certainly knew a couple of bright people who directly edited or created executable files using only a hex editor. One worked with a PDP-11. Another was an author of a couple of games on an Atari ST (in the mid 80s). The thing is, although a binary representation, the format of instructions was simple and - more importantly - well defined. – Peter Apr 15 '17 at 12:40
Machine language is the raw hex or binary stream of bytes that make up the executable code.
Assembly language is a mnemonic oriented intermediate representation of the machine language that is human readable. Starting with the machine language, it is an interpretation of what the machine language says.

- 15,862
- 4
- 48
- 67
-
Is it possible to learn raw hex or binary stream of bytes and is it still taught? – S.Saad Apr 15 '17 at 11:11
-
2@S.Saad: Certainly. The assembler is not magic and you can relatively easily do generate machine code by hand. Such encoding and instruction set representations are indeed commonly taught in (good) schools. This is used as a teaching aid in demystifying the computing though. Actually writing a non-trivial application in such a manner is tedious and error prone and would rarely be worthwile. – doynax Apr 15 '17 at 11:19
You write C++; the compiler frontend generates IR (intermediate representation) code (for example, in the case of LLVM/Clang, this is a SSA language form), the optimizer tweaks the IR, the compiler backend converts optimized IR to symbolic assembly (for your target CPU), the assembler converts the asm to machine code (the actual numeric values of the instructions and data that the CPU can execute).

- 30,449
- 3
- 47
- 70
Machine language is the lowest-level programming language (except for computers that utilize programmable microcode). Machine languages are the only languages understood by computers.
The computer did not understand normal code we write in c++. It can understand code only in binary form i.e object code form.
you may learn more here http://www.brighthubengineering.com/consumer-appliances-electronics/115635-machine-language-vs-high-level-languages/

- 535
- 6
- 17
-
"It can understand code only in binary form or object code form" - no "or"; object code *is* in binary form. – Jesper Juhl Apr 15 '17 at 11:45