1

My question is regarding the following paragraph on page 15 (Section 1.5) of The ANSI C Programming Language (2e) by Kernighan and Ritchie (emphasis added):

The model of input and output supported by the standard library is very simple. Text input or output, regardless of where it originates or where it goes to, is dealt as a stream of characters. A text stream is a sequence of characters divided into lines; each line consists of zero or more characters followed by a newline character. It is the responsibility of the library to make each input or output stream conform to this model; the C programmer using the library need not worry about how lines are represented outside the program.

I'm unsure of what is meant by the text in bold, especially the line "it is the responsibility of the library to make each input or ouptput stream conform to this model." Could someone please help me understand what this means?

At first, I thought it had something to do with the line-buffering of stdin I was seeing when I call getchar() when stdin is empty, but then learned that the buffering mode varies across implementations (see here). So I don't think this is what the text in bold is referring to when it talks about conforming to the text stream model.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
user51462
  • 1,658
  • 2
  • 13
  • 41

1 Answers1

3

Consider running code like printf("hello world"); in the firmware of a USB device. Suppose that whatever characters you pass to printf are sent over USB from the device to the computer. The way the USB protocol works, the characters must be split up into groups of characters called packets. There is a maximum packet size depending on how your USB hardware and descriptors are configured. Also, for efficiency, you want to fill up the packets whenever possible, because sending a packet that is less than the maximum size means the computer will stop letting you send more data for a while. Also, if the computer doesn't receive your packet, you might need to re-send it. Also, if your USB packet buffers are already filled, you might need to wait a while until one of them gets sent.

To make programming in C a manageable task, the implementation of printf needs to handle all of these details so the user doesn't need to worry about them when they are calling printf. For example, it would be really bad if printf was only able to send a single packet of 1 to 8 bytes whenever you call it, and thus it returns an error whenever you give it more than 8 characters.

This is called an abstraction: the underlying system has some complexity (like USB endpoints, packets, buffers, retries). You don't want to think about that stuff all the time so you make a library that transforms that stuff into a more abstract interface (like a stream of characters). Or you just use a "standard library" written by someone else that takes care of that for you.

If you want a more PC-centric example... I believe that printf is implemented on many systems by calling the write system call. Since write isn't always guaranteed to actually write all of the data you give it, the implementation of printf needs to try multiple times to write the data you give it. Also, for efficiency, the printf implementation might buffer the data you give it in RAM for a while before passing it to the kernel with write. You don't generally have to worry about retrying or buffering details while programming in C because once your program terminates or you flush the buffer, the standard library makes sure all your data has been written.

David Grayson
  • 84,103
  • 24
  • 152
  • 189
  • Thank you @David Grayson, I understand that text streams are an abstraction of I/O processes, but I wasn't sure why the authors define a text stream as a sequence of characters **"divided into lines separated by a newline character."** Why the mention of lines and not just "a sequence of characters"? Are they just saying that the functions in the `stdio` only operate on two units of input: characters and lines? E.g. `getchar` returns a character, whereas `fgets` returns a line. – user51462 Nov 02 '22 at 07:34
  • ...so by defining text stream in this way, the authors are basically saying that if an implementation is to conform with the text stream model, it can only work with characters and lines - is this correct? – user51462 Nov 02 '22 at 07:43
  • It doesn't say "separated" in your quote. It says each line *ends* with a newline character. This means that the programmer can always count on the last character of the abstract stream being a newline (even if that newline doesn't really exist outside the program), which might simplify some programs you want to write, or make it easier to define the behavior of other standard library functions like `getline`. I don't understand the meaning of your second question. – David Grayson Nov 02 '22 at 15:15
  • Sorry, I should have said "end" instead of "separated". When you say "even if that newline doesn't really exist outside the program", are you referring to how different systems have different representations of the newline character (e.g. `\r\n` on Windows)? – user51462 Nov 03 '22 at 03:24
  • 1
    RE my second question: I think what confused me about the quote in my post was why the authors were defining a text stream in terms of characters *and lines* instead of just defining it as a sequence of characters. **Why did they need to mention lines?** I think I understand now that, given the different line-ending conventions, the authors needed some way to abstract away those differences, so that `stdio` functions that operate on lines (e.g. `getline`, `fgets`) will work across all systems, regardless of how that system represents a newline. – user51462 Nov 03 '22 at 03:24
  • So, in the quote, I think the authors are just saying that it is the implementation's responsibility to ensure that the input stream conforms to C/UNIX's text stream model. This may require the implementation to transform the input stream (e.g. if I input a line on a Windows system, then the implementation must convert the `\r\n` at the end of my line to a `\n`), but the resulting stream must be a series of lines ending in `\n` before it is placed in `stdin`. – user51462 Nov 03 '22 at 03:25