28

I am having trouble piping stdin to an R script.

Here is my toy script test.R:

#!/usr/bin/env Rscript
while(length(line <- readLines('stdin', n=1, warn=FALSE)) > 0) {
  write(line, stderr())
  # process line
}

I'd like to go through each line and do some processing. Here is my input file named input:

aaaaaa
bbbbbb
cccccc
dddddd
eeeeee
ffffff

If I do

cat input | test.R

I only get:

aaaaaa

Is there anything that I missed?

zahypeti
  • 183
  • 1
  • 8
WYi
  • 1,565
  • 2
  • 16
  • 16

3 Answers3

47

This does not happen if you explicitly open the stdin connection.

#!/usr/bin/env Rscript
f <- file("stdin")
open(f)
while(length(line <- readLines(f,n=1)) > 0) {
  write(line, stderr())
  # process line
}
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • 3
    Do we need to close file in the end? – B.Mr.W. Nov 16 '14 at 23:08
  • 2
    If you want R to do the more typical "unix-y" thing and wait for input from stdin (so the code in the answer would behave similarly to running `cat` with no arguments) then you need to use `open(f, blocking=TRUE)`. – dshepherd Mar 26 '15 at 12:07
  • 6
    Also a tip for people (like me) who prefer `=` over `<-`: the `<-` *cannot* be replaced by `=` in `length(line <- readLines(f, n=1)`. – dshepherd Mar 26 '15 at 13:09
  • 1
    @dshepherd, the method in the answer (`file("stdin")`) is blocking by default. At least, the help page for `file` has a parameter `block = TRUE` – Aaron McDaid Aug 19 '15 at 13:07
  • 1
    I guess this is a little off topic, but what if you have an actual file called `stdin`. Would you have to do `file("./stdin")`, or something like that, to access it? – Aaron McDaid Jun 29 '16 at 15:23
  • @AaronMcDaid: yes, that is what I would do. – Vincent Zoonekynd Jun 29 '16 at 16:18
  • For whatever reason this doesn’t work for me on a Linux system. I have to use `readLine("stdin")`. I’ve tried countless other variations, none worked; either no data was read or an error (invalid connection) was thrown. – Konrad Rudolph Nov 18 '16 at 18:14
14

Jeff and I wrote littler to do just this (and a few other things). Because of littler, I never looked that closely at Rscript -- but this should in principle work just fine.

Here is one of our early examples, using output from /bin/ls (and a quick filter by awk) to summarize file size:

edd@max:~/svn/littler/examples$ ls -l /boot/ | \
                                    awk '!/^total/ {print $5}' | ./fsizes.r 
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
      24   130300   730700  3336000  4527000 14670000 

  The decimal point is 6 digit(s) to the right of the |

   0 | 0000000000000011111111122777777777
   2 | 777777777
   4 | 555577777
   6 | 
   8 | 
  10 | 
  12 | 5
  14 | 24466677

edd@max:~/svn/littler/examples$ 

Here the script fsizes.r is just three lines:

edd@max:~/svn/littler/examples$ cat fsizes.r 
#!/usr/bin/r -i

fsizes <- as.integer(readLines())
print(summary(fsizes))
stem(fsizes)
edd@max:~/svn/littler/examples$ 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 1
    readLines() reads all the lines into memory, which is what I am trying to avoid. I hope to read line by line, with n=1 in readLines() – WYi Feb 21 '12 at 01:20
  • So put an awk/sed/grep/... filter in the pipe, or dump to file and select. R does indeed want all its input read. – Dirk Eddelbuettel Feb 21 '12 at 01:22
5

If the input are numbers you can use:

x <- scan("stdin")

you can test it with:

$ echo -e "1\n2\n3" | R -s -e 'x <- scan("stdin"); summary(x)'
Read 3 items
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0     1.5     2.0     2.0     2.5     3.0 

Adapted from this answer and tested in R 4.2.2.

mmoya
  • 1,901
  • 1
  • 21
  • 30