5

I'm not too familiar with data.table's fread function, but it makes quick work of reading my data, so now I'm intrigued. At URL "http://www.retrosheet.org/CurrentNames.csv", there is a simple csv file. The following two calls work fine.

readLines("http://www.retrosheet.org/CurrentNames.csv", n = 2)
# [1] "ANA,LAA,AL,,Los Angeles,Angels,,4/11/1961,9/1/1965,Los Angeles,CA"
# [2] "ANA,CAL,AL,,California,Angels,,9/2/1965,9/29/1968,Anaheim,CA"
rcsv <- read.csv("http://www.retrosheet.org/CurrentNames.csv", header = FALSE)

But fread is delivering a download message, and I can't seem to turn it off with

showProgress = FALSE

I could use suppressMessages(), but I don't really want to.

library(data.table)
dtf <- fread("http://www.retrosheet.org/CurrentNames.csv", 
             header = FALSE, showProgress = FALSE)
# trying URL 'http://www.retrosheet.org/CurrentNames.csv'
# Content type 'text/plain' length 7729 bytes
# opened URL
# ==================================================
# downloaded 7729 bytes

Can anyone explain this, and can I turn it off in the fread arguments?

It looks like a call to download.file has occurred somewhere. Why wouldn't fread just read the URL the same way as read.csv?

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • :) how do you imagine you can "read straight from URL" and not download? But this is a good FR to add `quiet=TRUE` to `download.file` if `showProgress` is `FALSE` - you should add it to github. – eddi Jul 24 '14 at 15:04
  • @eddi - That was a bit of sarcasm, we did that conversation yesterday. :) But I also think a `quiet = TRUE` option would be a good addition to `fread`. And in all honesty, I'm not real versed on memory, storage, remote servers, etc. I just like baseball and R so I'm writing a kind of "learn as I go" package. – Rich Scriven Jul 24 '14 at 15:31
  • 2
    [Feature request](https://github.com/Rdatatable/data.table/issues/741) submitted – Rich Scriven Jul 24 '14 at 15:49

1 Answers1

7

Update Oct 2014. Now in v1.9.5 :

fread now passes showProgress=FALSE through to download.file() as quiet=!showProgress. Thanks to a pull request from Karl Broman and Richard Scriven for filing the issue, #741.


Previous answer ...

It does download the file, here is the part of the code that does it.

else if (substring(input, 1, 7) %chin% c("http://", "https:/", 
    "file://")) {
    tt = tempfile()
    on.exit(unlink(tt), add = TRUE)
    download.file(input, tt)
    input = tt
}

My guess this is because fread makes more than one pass at the file, first to get the structure, then to actually read the whole thing in. Saves downloading multiple times.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
JeremyS
  • 3,497
  • 1
  • 17
  • 19
  • What we need is a `quiet = TRUE` passed to `download.file` – Rich Scriven Jul 24 '14 at 02:45
  • 1
    Well, you can try http://stackoverflow.com/questions/2458013/what-ways-are-there-to-edit-a-function-in-r – Gabor Csardi Jul 24 '14 at 03:06
  • I guess I could edit it and call it something else. I'm going to be using it in a few package functions so I may need to keep it as is as well. – Rich Scriven Jul 24 '14 at 03:11
  • 2
    @RichardScriven What do you mean it's disguised as a URL. It's a file that lives on a remote server. Both `readLines` and `read.csv` are also downloading the file. They are just processing it as a stream (you have to be able to read the whole thing in memory). – MrFlick Jul 24 '14 at 03:11