1

I have this https://stackoverflow.com/a/1450396/1810962 answer from another post which almost achieves it:

import sys
data = sys.stdin.readlines()
preProcessed = map(lambda line: line.rstrip(), data)

I can now operate on the lines in data in a functional way by applying filter, map, etc. However, it loads the entire standard in into memory. Is there a lazy way to build a stream of lines?

joseph
  • 2,429
  • 1
  • 22
  • 43
  • what is wrong with `input` – sid-m Dec 01 '18 at 14:43
  • 3
    `sys.stdin` already *is* a lazy iterator of the lines in the file. `len(sys.stdin)` doesn't work, but something more explicit like `sum(1 for x in sys.stdin)` would. `len` specifically requires something more concrete than an arbitrary iterator. – chepner Dec 01 '18 at 14:46
  • let me update. i don't care about the len(data) part. i don't want to compute the length. I want to operate against a stream of lines – joseph Dec 01 '18 at 14:53
  • I understand that sys.stdin is lazy, but sys.stdin.readlines() is not – joseph Dec 01 '18 at 14:55
  • what is `input` ? – joseph Dec 01 '18 at 15:07
  • Never mind `input()`, it's how you read _one_ line from stdin. I don't see how it is relevant here. – alexis Dec 01 '18 at 15:43
  • Not sure why you're so keen on doing something inherently effectful in a 'functional' way.... – Jared Smith Dec 03 '18 at 00:15
  • @JaredSmith: Not sure why you think streaming from standard in is inherently effectful. – joseph Dec 03 '18 at 16:53
  • 3
    @joseph because reading from stdin *is* a side-effect. There's no referential transparency: you could get anything from stdin. Ditto for any other form of I/O. Which is why in e.g. haskell you have to wrap it in the IO monad. Your question conceptually doesn't quite meet at the seems for me. – Jared Smith Dec 03 '18 at 16:56
  • @JaredSmith: Suppose I want to compute the sum of the numbers in standard in. Even though standard in has referential transparency, it does not prevent me from filtering non-number lines, and compute the sum on the remaining lines using reduce. – joseph Dec 03 '18 at 17:16
  • 1
    @joseph you can certainly transform the data you get from stdin in ways that are more functional or less, I get it. I didn't downvote or vote to close your question (and I DV and VTC a *lot* of questions in this tag of the form "how do I do X but with functional programming"). I just think the wording of the question (again, this is just written feedback I didn't DV) focuses on entirely the wrong thing. STDIN is an unmoveable unchangeable piece of the puzzle and as such not much worth talking about. It's the data transformation that's interesting. – Jared Smith Dec 03 '18 at 17:28
  • This question is not a duplicate of the existing questions. The first link is for reading files. I'm reading from standard in. The second question is not a duplicate since they merely want to read from standard in. They did not qualify it with a lazy read. This question specifically requests not loading the entire standard in into memory. – joseph Feb 08 '23 at 21:24

1 Answers1

4

Just iterate on sys.stdin, it will iterate on the lines.

Then, you can stack generator expressions, or use map and filter if you prefer. Each line that gets in will go through the pipeline, no list gets built in the process.

Here are examples of each:

import sys

stripped_lines = (line.strip() for line in sys.stdin)
lines_with_prompt = ('--> ' + line for line in stripped_lines)
uppercase_lines = map(lambda line: line.upper(), lines_with_prompt)
lines_without_dots = filter(lambda line: '.' not in line, uppercase_lines)

for line in lines_without_dots:
    print(line)

And in action, in the terminal:

thierry@amd:~$ ./test.py 
My first line
--> MY FIRST LINE 
goes through the pipeline
--> GOES THROUGH THE PIPELINE
but not this one, filtered because of the dot. 
This last one will go through
--> THIS LAST ONE WILL GO THROUGH

A shorter example with map only, where map will iterate on the lines of stdin:

import sys

uppercase_lines = map(lambda line: line.upper(), sys.stdin)

for line in uppercase_lines:
    print(line)

In action:

thierry@amd:~$ ./test2.py 
this line will turn
THIS LINE WILL TURN

to uppercase
TO UPPERCASE
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
  • Thanks. I didn't know sys.stdin's iterates line by line. I was able to apply your solution to my program. – joseph Dec 03 '18 at 17:08
  • 2
    "Thanks. I didn't know sys.stdin's iterates line by line." - when chepner said initially, "`sys.stdin` already *is* a lazy iterator of the lines in the file.", **that's what that meant**. – Karl Knechtel Feb 06 '23 at 10:29