0

So I had a friend help me with some R code and I feel bad asking because the code works but I have a hard time understanding and changing it and I have this feeling that it's not correct or proper code.

I am loading files into separate R dataframes, labeled x1, x2... xN etc.

I want to combine the dataframes and this is the code we got to work:

assign("x",eval(parse(text=paste("rbind(",paste("x",rep(1:length(toAppend)),sep="",collapse=", "),")",sep=""))))

"toAppend" is a list of the files that were loaded into the x1, x2 etc. dataframes.

Without all the text to code tricks it should be something like:

x <- rbind(##x1 through xN or some loop for 1:length(toAppend)#)

Why can't R take the code without the evaluate text trick? Is this good code? Will I get fired if I use this IRL? Do you know a proper way to write this out as a loop instead? Is there a way to do it without a loop? Once I combine these files/dataframes I have a data set over 30 million lines long which is very slow to work with using loops. It takes more than 24 hours to run my example line of code to get the 30M line data set from ~400 files.

GregS
  • 279
  • 1
  • 4
  • 8
  • 2
    Can you make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for us to test with? If not, the R idiom would be something like `do.call(data.frame, lapply(file_names, read.csv))`. I'd also suggest breaking your problem into a few separate chunks and reading any of the excellent "Intro to R" guides. – Justin Jan 08 '13 at 22:03
  • It looks like they want to paste these things on top of each other and not side by side so `do.call(rbind, lapply(file_names, read.csv))` would be more appropriate. – Dason Jan 08 '13 at 22:04
  • 3
    Regardless - it seems that putting the objects into a list and using `do.call` with the appropriate function is what the OP will probably want to do. – Dason Jan 08 '13 at 22:05
  • @Dason you're correct, I miss typed. – Justin Jan 08 '13 at 22:07
  • You should consider looking at the `fread` function from the latest (development? version # is 1.8.7) version of the `data.table` package, and at the `data.table` package more generally ... – Ben Bolker Jan 08 '13 at 22:18
  • Yeah looking into data.table would definitely be beneficial for the OP if they're working with large data and want speed increases. – Dason Jan 08 '13 at 23:06
  • and if your using data.table, you can use `rbindlist` which is the superfast version of `do.call(rbind, list)` – mnel Jan 09 '13 at 00:00

1 Answers1

2

If these dataframes all have the same structure, you will save considerable time by using the 'colClasses' argument to the read.table or read.csv steps. The lapply function can pass this to read.* functions and if you used Dason's guess at what you were really doing, it would be:

 x <- do.call(rbind, lapply(file_names, read.csv, 
                                colClasses=c("numeric", "Date", "character")
               ))   # whatever the ordered sequence of classes might be

The reason that rbind cannot take your character vector is that the names of objects are 'language' objects and a character vector is ... just not a language type. Pushing character vectors through the semi-permeable membrane separating 'language' from 'data' in R requires using assign, or do.call eval(parse()) or environments or Reference Classes or perhaps other methods I have forgotten.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you! With the information I gave this eventually helped. The original code added the file name to a new column as the data was being read but I lost this when I used this answer code. The file name column was necessary for me to uniquely identify where the data came from. Instead I modified the source data files (after backing them up) to add the desired column earlier in my script. – GregS Feb 16 '13 at 01:44