How rewrite R for(){} function from dplyr to data.table?

Question

I have this for(){} inside a function that read specific file columns in a folder. But as I have several files it is very slow.

How could I rewrite this in data.table format?

I use arrange(), because after I will bind this two df's by name. Name are equal in files, but not equally ordered in these. It's necessary bind columns class1 and class2 by name for this I use arrange().

for (i in 1:length(temp)) {
    
    df1 <- read_table(temp[[i]],
                      col_types = "c________________f__",
                      col_names = c("name", "class1")) %>% 
      arrange(name)
    
    df2 <- read_table(str_remove(temp[[i]], "_automat"),
                      col_types = "c________________f__",
                      col_names = c("name", "class2")) %>% 
      arrange(name)
}

score 1 · Accepted Answer · answered Nov 11 '21 at 15:48

If you just want to convert this to data.tables, you can switch from read_table to fread, which is supposed to be faster and which generate a data.table which you can sort with [order(*)]:

library(data.table)

fread(file=temp[[i]], select = c(name='character', class1='numeric'))[order(name)]

That might increase your speed some, but I think if you want more significant improvements, I'd look into replacing your for loop with a parallel foreach loop from the foreach package. There are a number of questions talking about how to do that, but you might want to start here: run a for loop in parallel in R

setorder is likely to be faster than order – jangorecki Nov 11 '21 at 21:59 — jangorecki, Nov 11 '21 at 21:59

How rewrite R for(){} function from dplyr to data.table?

1 Answers1