2

I have 16 million customer records with more than 100 columns. I am interested in loading the complete data in R and want to run my R code on it.

I have used the following to load the data in R:

read.table("D:/data.txt",header = TRUE, sep = "þ",
           skipNul = TRUE,strip.white = TRUE,
           fill=TRUE, check.names = TRUE,na.string="NA",quote="")

However my system hung.

Is there any efficient and effective way to read in big data?

Jeromy Anglim
  • 33,939
  • 30
  • 115
  • 173
user3642360
  • 762
  • 10
  • 23
  • How "big" is your device? – IRTFM Jul 25 '14 at 05:54
  • This call is checking so many things that it should be quite slow. Have you tried reducing the argument list and possibly reading in chunks? Ironically, you leave out the arguments that recommended for maximum efficiency with `read.table` – Rich Scriven Jul 25 '14 at 05:54
  • If the damned thing won't fit into your available memory, you will have to read the [High performance task view](http://cran.r-project.org/web/views/HighPerformanceComputing.html). – Roman Luštrik Jul 25 '14 at 06:08

1 Answers1

5
library(data.table)

DT <- fread("D:/data.txt")

If you are dealing with data of that size, you will probably want to be using data.table anyway ;)

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • I have used the following: d<-fread("E:/big.txt", sep="þ", nrows=-1L, header=TRUE, na.strings="NA",stringsAsFactors=FALSE, verbose=FALSE, autostart=30L, skip=-1L, select=NULL,drop=NULL, colClasses=NULL,integer64=getOption("datatable.integer64"),showProgress=getOption("datatable.showProgress")) But It is giving the following error: Error in fread("E:/big.txt", sep = "þ", nrows = -1L, header = TRUE, na.strings = "NA", : embedded nul in string: '\0\n\07\09\07\03\00\09\01\07\05\0' – user3642360 Jul 25 '14 at 06:06