0

I'm trying to import a large number of text files and merge them into a single datatable using the script below, so I can parse the text . The files were originally eml files so the formatting is a mess. I'm not interested in separating the text into fields, it would be perfectly fine if the datatable only had one field with all the text from the files in it. When I run the script below I keep getting the following error.

Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match 

I've tried setting sep= various things or running it without it, but it still gives the same error. I've also tried running the same code except replacing read.table with read.csv, but again I get the same error. Any tips would be greatly appreciated.

setwd("~/stuff/folder")
file_list <- list.files()
for (file in file_list){
  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=FALSE,fill=TRUE,comment.char="",strip.white = TRUE)
  }
  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=FALSE,fill=TRUE,comment.char="",strip.white = TRUE)
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }
}
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
user3476463
  • 3,967
  • 22
  • 57
  • 117
  • 1
    i'd suggest replacing the `for…` construct with `lapply` then wrapping that with either `dplyr::bind_rows` or `data.table::rbindlist` (setting `fill=TRUE` for the latter) and assigning the result to your `dataset` variable vs build it piecemeal. – hrbrmstr Oct 02 '15 at 03:15
  • If you go with `data.table` you may find its `fread` function usefull to speed up the reading. – dtrv Oct 02 '15 at 05:16
  • 3
    Possible duplicate of [Reading multiple files into R - Best practice](http://stackoverflow.com/questions/32888757/reading-multiple-files-into-r-best-practice) – Jaap Oct 02 '15 at 05:59

1 Answers1

1

I think something lighter could work for you and may avoid this specific error:

them.files <- lapply(1:number.of.files,function(x) 
read.table(paste(paste("lolz",x,sep=""),'txt',sep='.')),header=FALSE,fill=TRUE,comment.char="",strip.white = TRUE)

Adapt the function to whatever your files names are.

Edit: Actually maybe something like this could be better:

 them.files <- lapply(1:length(file_list),function(x) 
 read.table(file_list[x],header=FALSE,fill=TRUE,comment.char="",strip.white = TRUE)

Merging step:

everyday.Im.merging <- do.call(rbind,them.files)

I am sure there are beautiful ways to do it with dplyr or data.table but I am a caveman.

If I may add something, I would also fancy a checking step prior the previous line of code:

sapply(them.files,str)
Julian Wittische
  • 1,219
  • 14
  • 22