7

I've given a look around about what puzzles me and I only found this: Do some programs not accept process substitution for input files?

which is partially helping, but I really would like to understand the full story. I noticed that some of my R scripts give different (ie. wrong) results when I use process substitution.

I tried to pinpoint the problem with a test case:

This script:

#!/usr/bin/Rscript

args  <- commandArgs(TRUE)
file  <-args[1]
cat(file)
cat("\n")
data <- read.table(file, header=F)
cat(mean(data$V1))
cat("\n")

with an input file generated in this way:

$ for i in `seq 1 10`; do echo $i >> p; done
$ for i in `seq 1 500`; do cat p >> test; done

leads me to this:

$ ./mean.R test
test
5.5

$ ./mean.R <(cat test)
/dev/fd/63
5.501476

Further tests reveal that some lines are lost...but I would like to understand why. Does read.table (scan gives the same results) uses seek?

Ps. with a smaller test file (100) an error is reported:

$./mean.R <(cat test3)
/dev/fd/63
Error in read.table(file, header = F) : no lines available in input
Execution halted

Add #1: with a modified script that uses scan the results are the same.

Community
  • 1
  • 1
vodka
  • 498
  • 2
  • 9
  • 1
    `read.table` might peek into the file to determine column formats and then fail when seeking back to the beginning. (Just a wild guess.) What happens if you `cat(head(data$V1))` in your R script? – krlmlr Apr 03 '13 at 10:18
  • with the process redirection it gives `2 3 4 5 6 7`, whithout it `1 2 3 4 5 6`. Printing the whole data frames gives 5001 lines without it (correctly) and 3050 in the other case. I also think that seek could be the problem but...shouldn't it report an error instead of going on with partial data? – vodka Apr 03 '13 at 10:47
  • going up and down in the file is definitely the problem: https://stat.ethz.ch/pipermail/r-help/2007-September/141769.html but I still believe that an error should be reported and maybe I will fill a bug. I still have to investigate if the same happens with scan (which gives me the same wrong results as read.table). – vodka Apr 03 '13 at 11:04

1 Answers1

9

I have written this general purpose function for opening a file connection in my own scripts:

OpenRead <- function(arg) {

   if (arg %in% c("-", "/dev/stdin")) {
      file("stdin", open = "r")
   } else if (grepl("^/dev/fd/", arg)) {
      fifo(arg, open = "r")
   } else {
      file(arg, open = "r")
   }
}

In your code, replace file with file <- OpenRead(file) and it should handle all of the below:

./mean.R test
./mean.R <(cat test)
cat test | ./mean.R -
cat test | ./foo.R /dev/stdin
flodel
  • 87,577
  • 21
  • 185
  • 223
  • Yep, this works (and as a matter of fact I noticed problems using "/dev/stdin" and switched to "stdin" some time ago). I still believe that this could be seen as a bug in R though... – vodka Apr 03 '13 at 11:17
  • Only 5 years after the fact, but confused about where to put `file <- OpenRead(file)`. In the OP example, would it come after `file <- args[1]`? – beroe Oct 08 '18 at 20:46