9

I am working with big data and I have a 70GB JSON file. I am using jsonlite library to load in the file into memory.

I have tried AWS EC2 x1.16large machine (976 GB RAM) to perform this load but R breaks with the error: Error: cons memory exhausted (limit reached?) after loading in 1,116,500 records. Thinking that I do not have enough RAM, I tried to load in the same JSON on a bigger EC2 machine with 1.95TB of RAM.

The process still broke after loading 1,116,500 records. I am using R version 3.1.1 and I am executing it using --vanilla option. All other settings are default.

here is the code:

library(jsonlite)
data <- jsonlite::stream_in(file('one.json'))

Any ideas?

Community
  • 1
  • 1
SargeDude
  • 91
  • 1
  • 2
  • My guess is that you're running out of memory, even on the larger EC2 instance. Have a look [here](http://stackoverflow.com/questions/1395270/determining-memory-usage-of-objects) to see if you can figure out how much memory, e.g. 100 records, is taking. You can extrapolate to the full size of your data set. R operates completely in memory, so once you exceed that, expect unnice things to happen. – Tim Biegeleisen Oct 19 '16 at 22:57

1 Answers1

0

There is a handler argument to stream_in that allows to handle big data. So you could write the parsed data to a file or filter the unneeded data.

Karsten W.
  • 17,826
  • 11
  • 69
  • 103