3

I'm studying C++ using the website learncpp.com. Chapter 0.5 states that the purpose of a compiler is to translate human-readable source code to machine-readable machine code, consisting of 1's and 0's.

I've written a short hello-world program and used g++ hello-world.cpp to compile it (I'm using macOS). The result is a.out. It does print "Hello World" just fine, however, when I try to look at a.out in vim/less/Atom/..., I don't see 1's and 0', but rather a lot of this:

H�E�H��X�����H�E�H�}���H��X���H9��

Why are the contents of a.out not just 1's and 0's, as would be expected from machine code?

E_net4
  • 27,810
  • 13
  • 101
  • 139
ersbygre1
  • 181
  • 6
  • 1
    Use a program used to view raw binary. I like HxD. All files - everything - on a computer is binary. Everything you see on your web browser right now as you read this is binary. Information = Data + Context. When you open a file in a particular program, that program interprets that data as if it were an expected context. In atom's case, it expects utf-8 characters. – JohnFilleau Aug 22 '20 at 01:27
  • 1
    Tip: you should get a [Good Book](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). – Geno C Aug 22 '20 at 01:28
  • 1
    Also HxD is what I use on Windows. Try https://stackoverflow.com/q/1765311/2027196 for a linux appropriate answer. – JohnFilleau Aug 22 '20 at 01:29
  • @JohnFilleau Contrary to what is stated in your linked post, xdd is not preinstalled on my Mac, neither was I able to install it via brew. However, I was able to view hex code via hexdump. But so far no luck for the binary version.. – ersbygre1 Aug 22 '20 at 01:35
  • try [godbolt](https://godbolt.org/), you can see generated assembly / machine code on the web – pvc Aug 22 '20 at 02:07
  • @Stephan, click on one of the hex pairs in HxD. On the right hand side of the screen you'll see info about that hex pair. One hex character is directly mappable to 4 bits, and you'll see the 8 bits that represent that hex pair in binary. Eventually you'll memorize which hex character (0 - F) maps to which binary quad (0000 - 1111), and it will be like Mouser reading characters in The Matrix. – JohnFilleau Aug 22 '20 at 02:11
  • 1
    *the purpose of a compiler is to translate human-readable source code to machine-readable machine code* A slightly better way to look at it is*the purpose of a compiler is to translate the observable behaviour described by human-readable source code to machine-readable machine code* The compiler is allowed to utterly transform the given code so long as the observable behaviour is maintained. You'll find insanely long and complicated code can result in two, three assembly instructions in some cases because practically all of the code can be resolved and/or discarded at compile time. – user4581301 Aug 22 '20 at 02:13
  • A guess regarding the downvotes: Your question is based upon misleading information (from the website) exacerbated by incorrect assumptions (on your part). The question *might* (not definitely, but might) be better received if it asked how to view the 1s and 0s (your original issue) instead of why your flawed approach failed to do so. See also [XY problem](https://en.wikipedia.org/wiki/XY_problem). You could still describe your attempt at the end of your question (in case you were on the right track), but ask about the original issue. – JaMiT Aug 22 '20 at 03:13

1 Answers1

6

They are binary bits (1s and 0s) but whatever piece of software you are using to view the file's contents is trying to read them as human readable characters, not as machine code.

If you think about it, everything that you open in a text editor is comprised of binary bits stored on bare metal. Those 1s and 0s can be interpreted in many many different ways, and most text editors will attempt to read them in as characters. Take the character 'A' for example. It's ASCII code is 65 which is 01000001 in binary. When a text editor reads through the file on your computer it is processing those bits as characters rather than machine instructions, and therefore it reads in 8 bits (byte) in the pattern 01000001 it knows that it has just read an 'A'.

This process results in that jumble of symbols you see in the executable file. While some of the content happens to be in the right pattern to make human readable characters, the majority of them will likely be outside of what either the character encoding considers valid or knows how to print, resulting in the '�' that you see.

I won't go into the intricacies of how character encodings work here, but read Character Encodings for Beginners for a bit more info.

joshmeranda
  • 3,001
  • 2
  • 10
  • 24
  • Thank you for your helpful answer! I have two follow-up questions: 1) is it correct to say that three files containing source code, assembly code, and machine code, all of them are 1's and 0's, but only the machine code one "makes sense" to the CPU? (meaning it knows what to do) 2) can you recommend a way to view the 1's and 0's of any given file using macOS? – ersbygre1 Aug 22 '20 at 01:47
  • 1
    Yes and no, there are layers between the cpu and the file. When you try to run an executable, it goes through the kernal and the os first which will check for format and some other things before passing the raw bits to the cpu. In theory if you passed the raw bits of 'ABCD' directly into the cpu it would try and read them as machine code, and asumming the patterns fit a valid code, the cpu would execute the code in some likey unpredicatble way. – joshmeranda Aug 22 '20 at 01:53
  • Sorry, I don't use Mac so I'm of no help there – joshmeranda Aug 22 '20 at 01:54
  • @Stephan They consist of values. You can present those values any way they want. Values are quantities, amounts. The number of fingers I have on each hand is a value, whether you express it is "five", "IIIII", "V" or "three plus two". Saying they're "just 1's and 0's" is really a dumb things to say. You can represent the values as just 1's and 0's, but you can represent any value as just 1's and 0's, but it's almost always pointless for humans to do that. – David Schwartz Aug 22 '20 at 02:00
  • David's right. At the bottom of the source code you've got 1s and 0s too. They just mean different things. @Stephan the closest you want to get to the machine code, the vast majority of the time, is the assembled output of the program. – user4581301 Aug 22 '20 at 02:16