1

I want to perform several operations intertwining dtplyr and data.table code. My question is whether, having loaded dtplyr, I can apply dplyr verbs to a data.table object and get optimized data.table code as I would with a lazy_dt.

I here provide some examples and ask: would dtplyr translate to data.table code here? Or is simply dplyr working?

# Setup for all chunks:
library(dplyr)
library(data.table)
library(dtplyr)

a) setDT

dataframe # class data.frame
setDT(dataframe)

dataframe %>% 
  group_by(id) %>% 
  mutate(rows_per_group = n())

b) data.table object

dt <- as.data.table(dataframe) # or dt <- data.table::fread(filepath)
dt %>%
  group_by(id) %>% 
  mutate(rows_per_group = n())

Also, if all of them make dtplyr work. What is the most efficient option between a), b) and c) using lazy_dt(dataframe)?

  • 1
    The help of lazy_dt says this: "If you have a data.table, using it with any dplyr generic will automatically convert it to a lazy_dt object" – Leonardo Hansa Mar 29 '23 at 12:16

0 Answers0