3

I have a large .csv file which I need to import into R in order to do some data manipulation on it. I'm using the read.csv(file.csv) method, where I assign the result of the method to some variable MyData. However, when I attempt to run this in the R REPL, the program crashes. Is there a way to efficiently and quickly process/read a .csv file in R that won't crash the terminal? If there isn't, shouldn't I just be using Python?

Ali BAGHO
  • 358
  • 6
  • 17
asdf asdf
  • 33
  • 1
  • 1
  • 5
  • You should also consider operating line-by-line with `read_lines` in `readr` package – CPak Aug 09 '17 at 15:59
  • My approach to running queries on very large (compressed) csv files: https://stackoverflow.com/a/68693819/8079808 – San Aug 09 '21 at 10:34

1 Answers1

4

R will crash if you try to load a file that is larger than your available memory, so you should see that you have at least 6gb ram free (a 6gb .csv is roughly 6gb in memory also). Python will have the same problem (apparently someone asked the exact same question for python a few years ago)

For reading large csv files, you should either use readr::read_csv() or data.table::fread(), as both are much faster than base::read.table().

readr::read_csv_chunked supports reading csv files in chunks, so if you don't need your whole data at once, that might help. You could also try just reading the columns of interest, to keep the memory size smaller.

Stefan F
  • 2,573
  • 1
  • 17
  • 19
  • 1
    what does an implementation of readr::read_csv_chunked look like? How can I take all of the chunks I get from the large .csv and concatenate them together at the end of the program? – asdf asdf Aug 09 '17 at 16:06
  • 1
    That is the problem, you can't if it doesn't fit into your system memory. If you really need files that large you should consider using a data base, or you could give the [ff package](https://cran.r-project.org/web/packages/ff/index.html) a shot. I have not worked with that, but I think ff objects have some limitations compared to data.frames, so it also depends on what you want to do with your data whether this would be useful for you or not – Stefan F Aug 09 '17 at 16:47
  • OK this isn't the solution that I chose, but it's one that is satisfactory for the problem provided, so I'll check it – asdf asdf Aug 09 '17 at 17:08
  • Sorry, R just gets cumbersome with out-of-memory data :/ I guess the real answer is "get more ram" – Stefan F Aug 09 '17 at 18:29
  • My approach to running queries on very large (compressed) csv files: https://stackoverflow.com/a/68693819/8079808 It is a demo of using `readr::read_csv_chunked` – San Aug 09 '21 at 10:35