0

I ran some simulations (coded in Julia) from which I have tons of data to extract. I want to analyze these data (that I dumped into .txt files) in R but it takes much time to import the .txt files.

Are there tips to improve the speed of data importation in R?

  • I am using read.table(). Is there a faster function?
  • My .txt file have is relatively wide (many columns) but relatively short (few rows). Would I increase the performance by changing the format on my .txt file? Should I aim to have a square matrix or should I aim to dump all data into one line (or one column)?
  • I have lots of boolean data. Would it be clever to replace my ones and zeros by true and false (or T and F)? (In the same logic, I realized that the function sum() is faster on a vector of true/false than on the same vector of 1/0).
  • Could I extract data and read data in binary files? Would it speed up data importation in R? Would it slow down my simulations in Julia?
Remi.b
  • 17,389
  • 28
  • 87
  • 168
  • 1
    Try `fread` from `data.table` for your first question. – akrun Jan 07 '15 at 08:39
  • 1
    A few long columns is always better than many short columns for import (because data.frame columns are just vectors in a list). Boolean data should be stored as `TRUE`/`FALSE` (if those are dummies a `factor` column instead of many dummy columns might also be an option). – Roland Jan 07 '15 at 08:48

0 Answers0