0

I have a fairly large dataset from an experiment I am running. My experiments emits data in a csv. However, one of the fields in the csv is additionally separated by spaces. How can I efficiently represent this in R?

Right now I parse the csv into a data frame, and then convert the variable field into a list of smaller data frames. Logically, this represents the data well, but it uses a ton of memory. R only uses ~150MB to parse the csv file, but the conversion of the variable field uses 8GB, at which point my machine runs out of memory.

Ed McMan
  • 521
  • 2
  • 15
  • The variable field only needs to be a list of vectors, not a list of data frames, surely? `list(c(1,34,3),c(5,2,1,2,3),c(5,5))` etc assuming its all one type. – Spacedman Apr 13 '15 at 13:13

1 Answers1

1

I would checkout the data.table package. Use its fread() to load your data. It inherits many of the properties of data.frame, but handles large data better. The package is on CRAN. If you're decent at R, the packages is not too hard to learn and handles large data better than base R.

Without a reproducible example, I cannot provide any additional coding tips.

Community
  • 1
  • 1
Richard Erickson
  • 2,568
  • 8
  • 26
  • 39