1

Working with a large tab delimited file (110 columns 2 million rows). The file contains text, dates, and numbers. I want to load all of it into R to do analysis, but can't successfully load all of it.

I've used the below code and it successfully loads all of my columns, but only ~400 observations. Can't seem to figure out why only this small portion of the entire file is being loaded. I'm not receiving any errors. Any insight into why this would be happening or an alternative method to load this data would be appreciated.

> audfeed <- read.table("Audience_Feed_Validation.txt", header =
> TRUE,fileEncoding="UTF-16LE",fill=T,na.strings="NA", sep =
> '\t',stringsAsFactors=FALSE)
pateljat
  • 23
  • 3
  • 2
    I'd use `data.table::fread` or `readr::read_delim` since both are much quicker than `read.table`. Hard to diagnose the problem you've got without seeing your file. – Nick Kennedy Jul 30 '15 at 19:18

1 Answers1

2

Try the fread function in the data.table package. It is very fast and efficient.

Thomas
  • 43,637
  • 12
  • 109
  • 140
mmmmmmmmmm
  • 435
  • 1
  • 3
  • 9
  • Tried fread but get an this: Error in fread("Audience_Feed_Validation.txt", sep = "\t", na.strings = "NA", : embedded nul in string: 'ÿþc\0u\0s\0t\0o\0m\0e\0r\0_\0i\0d\0' – pateljat Jul 30 '15 at 20:34