17

I've been wondering for a while now, how exactly does file streaming work? With file streaming, I mean accessing parts of a file without loading the whole file into memory.
I (believe to) know that the C++ classes (i|o)fstream do exactly that, but how is it implemented? Is it possible to implement file streaming yourself?
How does it work at the lowest C / C++ (or any language that supports file streaming) level? Do the C functions fopen, fclose, fread and the FILE* pointer already take care of streaming (i.e., not loading the whole file into memory)? If not, how would you read directly from the harddrive and is there such a facility alread implemented in C / C++?

Any links, hints, pointers in the right direction would already be very helpful. I've googled, but it seems Google doesn't quite understand what I'm after...


Ninja-Edit: If anybody knows anything about how to this works at assembly / machine code level and if it's possible to implement this yourself or if you have to rely on system calls, that would be awesome. :) Not a requirement for an answer, though a link in the right direction would be nice.

Xeo
  • 129,499
  • 52
  • 291
  • 397
  • 4
    For a *far* more detailed answer, I'd look into [Operating Systems](http://www.amazon.com/Operating-Systems-Design-Implementation-Second/dp/0136386776) by Andrew Tanenbaum. It's a fantastic book, very thorough and easy to read. – André Caron May 20 '11 at 02:27
  • You have to go out of your way to read directly from the harddrive. The kernel/OS caches the filesystem for a reason and you should practically never evade that. – Fred Nurk May 20 '11 at 02:51
  • @Andre: Thank you very much for the link, I was looking for something like that anyways! – Xeo May 20 '11 at 03:07
  • In fact, Operating Systems by Tanenbaum is good if you want to understand the concept, but if you want to understand the implementation I strongly suggest either "Understanding the Linux Kernel" or "Windows Internals". – malavv May 26 '11 at 16:17

2 Answers2

29

At the lowest level (at least for userland code), you'll use system calls. On UNIX-like platforms, these include:

  • open
  • close
  • read
  • write
  • lseek

...and others. These work by passing around these things called file descriptors. File descriptors are just opaque integers. Inside the operating system, each process has a file descriptor table, containing all of the file descriptors and relevant information, such as which file it is, what kind of file it is, etc.

There are also Windows API calls similar to system calls on UNIX:

Windows passes around HANDLEs, which are similar to file descriptors, but are, I believe, a little less flexible. (for example, on UNIX, file descriptors can not only represent files, but also sockets, pipes, and other things)

The C standard library functions fopen, fclose, fread, fwrite, and fseek are merely wrappers around these system calls.

When you open a file, usually none of the file's contents is read into memory. When you use fread or read, you tell the operating system to read a particular number of bytes into a buffer. This particular number of bytes can be, but does not have to be, the length of the file. As such, you can read only part of a file into memory, if desired.

Answer to ninja-edit:

You asked how this works at the machine code level. I can only really explain how this works on Linux and the Intel 32-bit architecture. When you use a system call, some of the arguments are placed into registers. After the arguments are placed into the registers, interrupt 0x80 is raised. So, for example, to read one kilobyte from stdin (file descriptor 0) to the address 0xDEADBEEF, you might use this assembly code:

mov eax, 0x03       ; system call number (read = 0x03)
mov ebx, 0          ; file descriptor (stdin = 0)
mov ecx, 0xDEADBEEF ; buffer address
mov edx, 1024       ; number of bytes to read
int 0x80 ; Linux system call interrupt

int 0x80 raises a software interrupt that the operating system usually will have registered in the interrupt vector table or interrupt descriptor table. Anyway, the processor will jump to a particular place in memory. Once there, usually the operating system will enter kernel mode (if necessary) and then do the equivalent of C's switch on eax. From there, it will jump into the implementation for read. In read, it will usually read some metadata about the descriptor from the calling process's file descriptor table. Once it has all the data it needs, it does its stuff, then returns back to the user code.

To "do its stuff", let's assume it's reading from disk, and not a pipe or stdin or some other non-physical place. Let's also assume it's reading from the primary hard disk. Also, let's assume the operating system can still access the BIOS interrupts.

To access the file, it needs to do a bunch of filesystem things. For example, traversing the directory tree to find where the actual file is. I'm not going to cover this, much, since I bet you can guess.

The interesting part is reading data from the disk, whether it be filesystem metadata, file contents, or something else. First, you get a logical block address (LBA). An LBA is just an index of a block of data on the disk. Each block is usually 512 bytes (although this figure may be dated). Still assuming we have access to the BIOS and the OS uses it, it then will convert the LBA to CHS notation. CHS (Cylinder-Head-Sector) notation is another way to reference parts of the hard drive. It used to correspond to physical concepts, but nowadays, it's outdated, but almost every BIOS supports it. From there, the OS will stuff data into registers and trigger interrupt 0x13, the BIOS's disk-reading interrupt.

That's the lowest level I can explain, and I'm sure the part after I assumed the operating system used the BIOS is outdated. Everything before that is how it still works, though, I believe, if not at a simplified level.

bk1e
  • 23,871
  • 6
  • 54
  • 65
icktoofay
  • 126,289
  • 21
  • 250
  • 231
  • See my little ninja edit and my comment at the other answer. :) – Xeo May 20 '11 at 02:21
  • Stack Overflow is merging some of my edits so the edit comments may not be as correct as they could be. – icktoofay May 20 '11 at 02:33
  • Wrt. `HANDLE` on Windows, it's similar to Unix. They can represent pipes and communication ports. However, weirdly enough they can't represent sockets. – MSalters May 20 '11 at 09:03
  • @MSalters: I was aware they couldn't represent sockets, but I didn't know they could represent other things other than files. – icktoofay May 21 '11 at 02:10
2

At the lowest level, on POSIX platforms, open files are represented by "descriptors" in userspace. A file descriptor is just an integer which is unique across open files at any given time. The descriptor is used to identify which open file an operation should be applied to when asking the kernel to actually perform that operation. So, read(0, charptr, 1024) does a read from the open file which is associated with the descriptor 0 (by convention, this will probably be a process's standard input).

As far as userspace can tell, the only parts of a file that are loaded into memory are those that are required to satisfy an operation like read. To read bytes from the middle of a file, another operation is supported - ''seek''. This tells the kernel to reposition the offset in a particular file. The next read (or write) operation will work with bytes from that new offset. So lseek(123, 100, SEEK_SET) repositions the offset for the file associated with 123 (a descriptor value I just made up) to the 100th byte position. The next read on 123 will read starting from that position, not from the beginning of the file (or wherever the offset was previously). And any bytes not read don't need to be loaded into memory.

There is a little more complexity behind the scenes - the disk usually can't read less than a "block" which is typically a power of two around 4096; the kernel probably does extra caching and something called "readahead". But these are optimizations, and the basic idea is what I described above.

Jean-Paul Calderone
  • 47,755
  • 6
  • 94
  • 122
  • Added a little ninja-edit, as I'm really interested in what goes on "behind the scenes". – Xeo May 20 '11 at 02:20