Understanding how operating systems store/retrieve IO device input

Question

I am a bit confused on how I/O devices like keyboards store their input for use by the operating system or an application. If I have a computer with a single processor (a CPU with a single core), and the current executing process is a game, how is the game able to be "aware" of keyboard input? Even if a key press were to force a hardware interrupt (and thus context switch), and then "feed" the key value to the game whenever the OS gives control back to the game process, there's no guarantee that the game loop would even be checking for player input at that time, it could just as well be updating game object positions or rendering the game world.

So my question is this...

Do I/O devices like keyboards store there input to some kind of on-board hardware specific microprocessors or on-board memory storage buffer queues, which can later be "read" and flushed by external processes or the OS itself? Any insight is greatly appreciated, thanks!

Yes. The keypresses, and their interpretation as whatever, are typically queued/buffered in hardware and software until a thread in the process that has input focus requests them. If that thread had already requested such input and was blocked because no input was available, it is made ready/running. It is common to apply temporary priority boosts to such threads to increase the chance of running 'immediately' upon keypress so as to improve interactive user experience. — Martin James, May 21 '22 at 03:24

score 3 · Accepted Answer · answered May 21 '22 at 06:14

Do I/O devices like keyboards store there input to some kind of on-board hardware specific microprocessors or on-board memory storage buffer queues, which can later be "read" and flushed by external processes or the OS itself?

Let's split it into 3 parts..

The Device Specific Part

For old keyboards (before USB); a microcontroller in the keyboard regularly scans a grid of switches and detects when a key is pressed or released, converts that into a code (which could be multiple bytes), then sends the code on byte at a time to the computer. The computer also has a little microcontroller to receive these bytes. That microcontroller has a 1 byte buffer (not even big enough for multi-byte codes).

For newer keyboards (USB); the keyboard's internals a mostly the same (microcontroller scanning a grid of switches, etc); but USB controller asks the keyboard "Anything happen?" regularly (typically every 8 milliseconds) and the keyboard's microcontroller replies.

In any case; the keyboard driver gets the code that came from the keyboard and processes it; typically converting it into a fixed length "key code", merging it with other data (if shift or capslock or... was active at the time; if there's unicode codepoint/s that make sense for the key, etc) and bundles all that into a data structure.

The OS Specific Part

That data structure (created by the keyboard driver) is typically standardized by the OS as a "user input event" (so, same "event" data structure for keyboard, mouse, touchscreen, joystick, ...).

That "user input event" is sent from driver via. some form of inter-process communication (pipes, messages, ...) to something else (e.g. GUI). This inter-process communication has 2 common behaviors - if the receiving program is blocked waiting to receive an event then the scheduler unblocks it (cancels the waiting) and schedules it to get CPU time again; and if the receiving program isn't waiting the event is often put on a queue (in memory) of pending events.

Of course often there's many processes involved, and the "user input event" might be forwarded from one process (e.g. input method editor) to another process (e.g. GUI) to another process (e.g. whichever window has keyboard focus). Also (for old legacy command line stuff) it might end up at a translation layer (e.g. terminal emulator) that converts the events into a character stream (stdin) while destroying most of the information (e.g. when a key is released).

The Language Specific Part

To get the event from high level code, it depends what the language is and sometimes also which library is being used. The most common is some kind of "getEvent()" that causes the program to fetch the next event from its queue (from memory); and may cause the program to wait (and not use any CPU time) if there isn't any event get yet. However, often that is buried further, such that you register a callback and then when something else calls "getEvent()" and when it receives an even it calls the callback you registered; so it might end up like (e.g. for Java) public boolean handleEvent(Event evt) { switch (evt.id) { case Event.KEY_PRESS: ....

Hi Brendan, thank you for the immaculate explanation. My only question is when you say, "USB controller asks the keyboard "Anything happen?" regularly (typically every 8 milliseconds) and the keyboard's microcontroller replies"..., is this creating a full context switch via the OS scheduler every 8ms or does this "polling" happen separately on the USB controller itself and not interrupt any current executing process UNTIL an actual input event occurs? — 4Matt, May 21 '22 at 11:30
@4Matt: Typically the USB controller's driver sets up lists of "what to transfer when" and gives them to the USB controller; then the USB controller uses the lists to figure out what to do each 1 millisecond frame - e.g. in one 1 millisecond frame; USB controller might poll 4 different devices then transfer data to/from 4 more devices; then at the end of the frame the USB controller may/will (if requested by any transfer) generate an IRQ so that the USB controller driver can examine data (and forward info to other drivers). How much is "merely interrupt" or "full task switch" depends on OS. — Brendan, May 21 '22 at 15:32

score 1 · Answer 2 · answered May 21 '22 at 06:30

The keyboards are mostly USB today. On most computers, including ARM computers, you have a USB controller implementing the xHCI specification developed by several major tech companies. If you google "xhci intel spec" or something similar, the first link or so should be a link to the full specification.

The xHCI spec requires implementations to be in the form of a PCI device. PCI is another spec which is developed by the PCI-Seg group. This spec specifies everything down to hardware requirements. It is not a free spec like xHCI. It is actually quite expensive to obtain (around 3k$).

The OS detects PCI devices using ACPI or similar specifications which can sometimes be board specific (especially for ARM because all x86 based computers implement ACPI). ACPI tables, found at conventionnal positions of RAM, mention where to find base addresses of the memory mapped configuration space of each PCI device.

The OS interacts with PCI devices using registers that are memory mapped in RAM. The OS reads/writes at the positions specified by ACPI tables and by the configuration spaces themselves to gather information about a certain device and to make the device do operations on its behalf. The configuration space of a PCI device have a general portion (the same for every device) and a more specific portion (device dependent). In the general portion, there are BAR registers that contain the address of the device dependent portion of the config space. Each implementer of the convention can do whatever they want with the device specific portion. The general portion must be somewhat similar for every device so that the OS can properly detect and initialize the device.

Today, each type of device have to respect a certain convention to work with current computers. For example, hard-disks will implement SATA and the board will have an AHCI (a PCI SATA controller). Similarly, keyboards will implement USB and the board has an xHCI.

The xHCI itself have complex interaction mechanisms. A summary for keyboards, is that the OS will "activate" the interrupt IN endpoint of the keyboard. The OS will then place transfer requests (TRBs) on a circular ring in RAM. For interrupt endpoints, the xHCI will read one transfer request per time interval specified by the OS in a specific xHCI register. When the xHCI executes a TRB (when it does the transfer), it will post an event to another circular ring in RAM called the Event Ring. When a new event is posted, the xHCI triggers an interrupt. In the interrupt handler, the OS will read the Event Ring to determine what happened. For a keyboard, the OS will probably see that a transfer was done and read the data. Most often, keyboards will return 8 bytes. Interpretation of each byte is the keys that were pressed at the moment of the transfer. The bytes contain conventional scancodes. So they are not directly in UTF-8 or ASCII format. There is one scancode per keyboard key. It is up to the OS to determine what to do depending on the keyboard key. For example, if the data says that the 'A' key is pressed, then the OS can look if the 'SHIFT' key is pressed to determine if the 'A' should be uppercase or lowercase. If the next report says that the 'A' key is not pressed than the OS should consider this as a release of the key (the user released the key). In other words, the keyboard is polled by the OS at certain intervals.

The interrupt handler will probably pass the key to other portions of the kernel and save the key to a process specific structure. The process will probably poll a lock protected buffer that will contain events. It could also be a system wide buffer that simply contains all events.

The higher level implementation probably varies between OS. If you understand the lower level inner workings than you can probably imagine how an OS will work. It is certain that the whole things is very complex because CPUs nowadays have several cores and caches and other complex mechanisms. The OS must make sure to work seamlessly with all these mechanisms so it is quite complex but achievable. To understand the whole thing, you'd probably need to understand the whole OS. It has lots of ramifications in other OS concepts.

Understanding how operating systems store/retrieve IO device input

2 Answers2

Linked