I think you are overcomplicating this. Processors are very dumb, very very dumb, they only do what the instructions tell them to do. The programmer ultimately is responsible for laying a path of valid, sane instructions out in front of the processor in the way that a train is dumb and only follows its tracks, if we don't lay the tracks properly the train will derail.
compilers as a program in general convert from one language to another, not necessarily from C to machine code. It could be from who knows JAVA to C++ or something. And not all C compilers output machine code, some output assembly language then an assembler is called.
gcc hello.c -o hello
gcc the program is mostly just a shell program that calls a pre-parser, that does things like replace the includes and defines in a recursive way so that the output of that parser is a single file that can be fed to the compiler. That file is then fed to the compiler which may produce other files or internal data structures and ultimately the actual compiler outputs assembly language. As shown above then gcc calls the assembler to turn the assembly language into an object file with as much machine code as it can manage, some external references are left for the linker, the code was generated to deal with these in a sane way per the instruction set.
The linker then as directed by whomever prepared this toolchain combines the linker from binutils with the C library bundled with the toolchain, or pointed to by the toolchain and links the hello object file with any other libraries needed including the bootstrap, as shown above a linker script prepared by/for the C library in question is used since one was not indicated on the command line. The linker does its job of placing items where asked as well as resolving externals and at times adding instructions to glue these separate objects together, then outputs a file in the file format that was set as default when the toolchain was built. And then gcc goes and cleans up the intermediate files either as it goes or at the end, whatever.
A compiler that compiles straight to machine code simply skips the step of calling the assembler but linking of separate objects and libraries with some form of instructions to the linker about address space is still necessary.
malloc is not an instruction, it is a function that is fully realized in machine code after that function is compiled, for performance reasons it is not uncommon for a C library to create that function in assembly language by hand, either way it is just some other code that gets linked in. A processor can only execute instructions implemented in that processors logic.
The software interrupts are just instructions, when you execute a software interrupt it is really nothing more than a specialized function call, and the code you are calling is yet more code that someone wrote, compiled into machine code, no magic.
A processor has absolutely no idea what usb is or pcie or a gpu, etc. It only knows the instruction set it was implemented to execute, that is all. All of those other high level concepts are not even known by the programming languages even high level ones like C, C++, JAVA, etc. to the processor there are some loads and stores, memory or I/O in the case of x86, the sequence and address of those is the job of the programmer, to the processor its just instructions with addresses, nothing magic nothing special. The addresses are in part part of the system design of the board, where and how you reach a usb controller, pcie controller, dram, video, etc, both the board/chip designers and the software folks know where these addresses are and write code to read/write those addresses to make the peripheral work.
The processor only knows the instructions it has been designed to execute, nothing more, there is generally no magic. CISC processors like the x86, because of the excess complication per instruction, have historically been implemented using microcode for various reasons. So this is an exception to the no magic deal. Using microcode is cheaper in various ways than to discretely implement each instruction with a state machine. The implementation is some combination of state machines and if you will some other instruction set with some other processor, it is not truly an interpreted deal it is a hybrid that makes sense from a business and engineering perspective.
RISC in concept was based on decades of CISC history as well as improvements in the production of products, and tools, and the advancement of programmers abilities, etc. So you now see many RISC processors which are implemented without microcoding, as needed small state machines but in general nothing that would compare to a CISC instruction sets requirements. There is a trade off between number of instructions and code space, vs chip size and performance (power, speed, etc).
"On modern architectures, peripherals are accessed in a similar way to memory: via mapped memory addresses on a bus."
If you were to simply look at the instruction set and best look at the 8088/86 hardware and software reference manuals. Then examine a modern processor bus, there are today many control signals on a bus, indicating not just read vs write and address and data, but type of access, cacheable or not, etc. Going back to the 8088/86 days the designers had a correct notion of the fact that peripherals have two types of controls one is control and status registers, I want to set a graphics mode that is this many pixels by this many pixels. I want it to be this many colors and use a palette that is this depth. Then you have the actual pixels that you want to access ideally in large groups a scan line at a time a frame at a time in a loop/burst copy. So for the control registers you are generally going to access them one at a time, randomly. For the pixel memory you are generally going to access that in bursts sometimes many bytes at a time.
So having a single bit on the bus that indicates I/O vs memory made sense, remember we didn't have fpgas yet, and asics were almost unobtanium, so you wanted to help the glue logic the best you could, so adding a control signal here or there helped. Today in part because relatively the cost and risk of producing asics is cheaper, the tools are much better, programmers skills and how they do things have advanced. The things that helped us in the past can get in the way, so the notion of control vs memory is still very much present in peripherals, but we don't necessarily need to have a control signal nor separate instructions. If you go backward before the 8088/86 to some DEC processors, you had specific instructions for the peripherals, you wanted to output a character to the tty there was an INSTRUCTION for that, not just an address you wrote to. This was the natural progression and today's just make everything memory mapped and use generic load and store instructions.
I can't fathom how you got I/O vs memory to imply there isn't x86 machine code, just look at the instruction set to see the I/O instructions and the memory instructions. They are there, for reverse compatibility reasons which is what kept the Wintel pc world alive for decades they still work, but they are synthesized into something closer to a memory mapped solution, at the same time programmers have migrated away from I/O mapped, it is ideally only very very old code that would be trying to do that, and combination of hardware and software can still make some of that code work on a modern pc.