0

I am starting to learn Haskell, and I am trying to understand how much work do the functions do (specially with respect to the laziness concept). Please see the following program:

main::IO()
main = interact ( head . words)

Will this program read all the input or only the first word in input?

dmg
  • 4,231
  • 1
  • 18
  • 24

2 Answers2

3

Just the first word:

% yes | ghc -e 'interact (head . words)'
y
%

But beware: this relies a feature called "lazy IO" that is only kind of related to the technique of laziness in pure code. Pure functions are lazy by default and you must work hard to make them strict; IO is "strict IO" by default and you must work hard to make it lazy IO. A handful of library functions (notably interact, (h)getContents, and readFile) have gone to this effort.

It also has some problems with composability.

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
  • Thank you. i also appreciate the depth of the answer, including how to test it. – dmg May 24 '18 at 00:08
3

Conceptually, it reads only what it needs. But it probably uses a buffer to do so:

$ yes | strace -feread,write ghc -e 'interact (head . words)'
...
[pid 61274] read(0, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8096) = 8096
[pid 61272] write(1, "y", 1y)            = 1
[pid 61272] --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=0, ptr=0}} ---
[pid 61272] write(5, "\376", 1)         = 1
[pid 61273] read(4, "\376", 1)          = 1
[pid 61273] +++ exited with 0 +++
[pid 61274] +++ exited with 0 +++
[pid 61276] +++ exited with 0 +++
+++ exited with 0 +++

This shows (on a Linux system) that the program split itself into multiple threads, one of them read 8KiB of data from stdin, then another output the first word. The main reason is that repeatedly reading small amounts is quite inefficient. Asynchronous sources like terminals and sockets may produce smaller amounts of data, though:

$ strace -f -e trace=read,write -e signal= ghc -e 'interact (head . words)'
...
hello program
Process 61594 attached
[pid 61592] read(0, "hello program\n", 8096) = 14
[pid 61590] write(1, "hello", 5hello)        = 5

In this case, the terminal layer completed the read at the first newline, even though the buffer was still 8KiB large. As this was enough data for the first word to be identified, no further reads were needed.

Yann Vernier
  • 15,414
  • 2
  • 28
  • 26