6

I am getting below error when reading first n rows from a big file(around 50 GB) using fread. Looks like a memory issue. I tried to use nrows=1000 . But no luck. Using linux

file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.

Can this below code be replaced with read.csv with all options as used below? Does it help?

  rdata<- fread(
      file=csvfile, sep= "|", header=FALSE, col.names= colsinfile,
    select= colstoselect, key = "keycolname", na.strings= c("", "NA")
    , nrows= 500
  )
mt1022
  • 16,834
  • 5
  • 48
  • 71
sjd
  • 1,329
  • 4
  • 28
  • 48
  • 1
    What if you replace `csvfile` with `paste('head -n 500', csvfile)`? – mt1022 Sep 25 '18 at 07:46
  • @mt1022 : got an error `File 'head -n 500 /csvfile' doesnt exist` – sjd Sep 25 '18 at 07:54
  • 1
    The argument should finally looks like `input = "head -n 500 /path/to/csvfile"`. Please use the `input` argument rather than `file` argument to allow shell commands. I do not have a file that large to test. I hope this works. – mt1022 Sep 25 '18 at 07:59
  • @mt1022 : thats awsome. when used with `input` it works!.. You should put this as answer – sjd Sep 25 '18 at 09:28

2 Answers2

4

Another workaround is to fetch the first 500 lines with shell command:

rdata<- fread(
    cmd = paste('head -n 500', csvfile),
    sep= "|", header=FALSE, col.names= colsinfile,
    select= colstoselect, key = "keycolname", na.strings= c("", "NA")
)

I don't known why nrows doesn't work, though.

mt1022
  • 16,834
  • 5
  • 48
  • 71
1

Perhaps this would help you:

processFile = function(filepath) {
con = file(filepath, "r")
while ( TRUE ) {
line = readLines(con, n = 1)
if ( length(line) == 0 ) {
  break
}
print(line)
}
close(con)
}

see reading a text file in R line by line.. In your case you'd probably want to replace the while ( TRUE ) by for(i in 1:1000)

gaut
  • 5,771
  • 1
  • 14
  • 45