So, computer programs/executables are just binary data (0's and 1's)?
Yes like images, videos and other data.
When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?
Yes, in this exact case it will always be correct as mov al, 61h
is always assembled to 0xB0 0x61
(in Intel 64 and IA-32 Architectures Software Developer's Manuals and other places usually written as B0 61
) in 16-, 32- and 64-bit mode. Note that 0xB0 0x61
= 0b10110000 0b01100001
.
You can find the encoding for different instructions in Volume 2A. For example here it is "B0+ rb MOV r8, imm8 E Valid Valid Move imm8 to r8." on page 3-644.
Other instructions have different meanings depend on whether they are interpreted in 16/32 or 64-bit mode. Consider this short sequence of bytes: 66 83 C0 04 41 80 C0 05
In 16-bit mode they mean:
00000000 6683C004 add eax,byte +0x4
00000004 41 inc cx
00000005 80C005 add al,0x5
In 32-bit mode they mean:
00000000 6683C004 add ax,byte +0x4
00000004 41 inc ecx
00000005 80C005 add al,0x5
And finally in 64-bit mode:
00000000 6683C004 add ax,byte +0x4
00000004 4180C005 add r8b,0x5
So the instructions cannot always be disassembled correctly without knowing the context (this is not even taking into account that other things than code can reside in the text segment and the code can do nasty stuff like generate code on the fly or self-modify).
If I have this 10110000 01100001 program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001 figures?
Yes, in the sense that if the application contains the mov al, 61h
instruction the file will contain the bytes 0xB0
and 0x61
.
How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"? Can I do that in C#/C++ directly?
After loading the code into memory (and the memory is correctly setup permission-wise) it can just jump to or call it and have it run. One thing you have to realize even though the operating system is just another program it is a special program since it got to the processor first! It runs in a special supervisor (or hypervisor) mode that allows it to things normal (user) programs aren't allowed to. Like set up preemptive multitasking that makes sure processes are automatically yielded.
The first processor is also responsible for waking up the other cores/processors on a multi-core/multi-processor machine. See this SO question.
To call code you load yourself directly in C++ (I don't think it is possible in C# without resorting to unsafe/native code) requires platform specific tricks. For Windows you probably want to look at VirtualProtect
, and under linux mprotect(2)
. Or perhaps more realistically from a file which is the mapped using either this process for Windows or mmap(2)
for linux.