0

I have this for(){} inside a function that read specific file columns in a folder. But as I have several files it is very slow.

How could I rewrite this in data.table format?

I use arrange(), because after I will bind this two df's by name. Name are equal in files, but not equally ordered in these. It's necessary bind columns class1 and class2 by name for this I use arrange().

for (i in 1:length(temp)) {
    
    df1 <- read_table(temp[[i]],
                      col_types = "c________________f__",
                      col_names = c("name", "class1")) %>% 
      arrange(name)
    
    df2 <- read_table(str_remove(temp[[i]], "_automat"),
                      col_types = "c________________f__",
                      col_names = c("name", "class2")) %>% 
      arrange(name)
}
user438383
  • 5,716
  • 8
  • 28
  • 43
Wilson Souza
  • 830
  • 4
  • 12

1 Answers1

1

If you just want to convert this to data.tables, you can switch from read_table to fread, which is supposed to be faster and which generate a data.table which you can sort with [order(*)]:

library(data.table)

fread(file=temp[[i]], select = c(name='character', class1='numeric'))[order(name)]

That might increase your speed some, but I think if you want more significant improvements, I'd look into replacing your for loop with a parallel foreach loop from the foreach package. There are a number of questions talking about how to do that, but you might want to start here: run a for loop in parallel in R

divibisan
  • 11,659
  • 11
  • 40
  • 58