0

Trying to View a RData file which is 650,000KB and the machine has 8GB RAM but keep getting this error:

Reached total allocation of 7758Mb: see help(memory.size)

Error in View : cannot allocate vector of size 54.6 Mb

I had initially imported this file (2.5GB) using Read.table with great difficulty due to the size. Code used:

A <- read.table(file.choose(),header = TRUE, sep= "|",  fill = T, nrows = 9000000, stringsAsFactors = T)
save(A, file ="A.RData")
load("A.RData")
View(A)
digEmAll
  • 56,430
  • 9
  • 115
  • 140
Amit Verma
  • 95
  • 1
  • 1
  • 6
  • the answer depends on what you need to do when all the data is read. so what do you need to do? modelling? data cleaning? – RJ- Aug 21 '14 at 06:42
  • Data is clean already, start using it in analysis. – Amit Verma Aug 21 '14 at 06:50
  • also forgot to mention that I am fairly new to R! – Amit Verma Aug 21 '14 at 06:56
  • what sort of analysis do you have in mind? modelling? you probably need to be very specific on what you want. – RJ- Aug 21 '14 at 06:56
  • 2
    Remove `View(A)` ... I guess you are using Rstudio and you're trying to visualize the data. It's too big, you should visualize a subset of it, e.g. use `head(A)` `tail(A)` etc... – digEmAll Aug 21 '14 at 06:57

1 Answers1

0

Given the size of your file I would suggest using ff and ffbase packages and reading directly from the text file with read.table.ffdf. This circumvents the memory requirements by keeping the data on the hard drive, but doing operations on it as if it was read in RAM. Unfortunately, there are not that many functions implemented for ffdf at the moment, so depending on the modeling you would like to do, you might need to write your own implementations of the tools you would need.

Other than that you might try setting up a hadoop cluster and using rhadoop or something, but I'm not really an authority on that.

  • They should first try using package data.table. If they don't do something unreasonable (like `View`ing data with several million observations) RAM might suffice. – Roland Aug 21 '14 at 07:53
  • See the nrows in the beginning - that is precisely what the asker is doing. Also, as far as I get it, the problem is with reading the file in the first place and I'm not sure data.table can help with that. – Martin Markov Aug 21 '14 at 10:07
  • As I said, it's unreasonable to `View` such large data. And it's also pretty useless. data.table's `fread` shouldn't have a problem with reading a 2.5 GB file on a 8 GB machine. However, it is missing a `fill` argument and fortunately doesn't convert characters to factors. – Roland Aug 21 '14 at 10:52
  • @martinMarkov , thanks I'll give that a go and hopefully it works. – Amit Verma Aug 23 '14 at 04:17
  • By the way, since View cuts its output to the first 1000 rows anyway you can just do `View(A[1:1000, ])` and get pretty much the same thing without the insane memory requirements. You can also cut it to even fewer rows. – Martin Markov Aug 25 '14 at 06:49