An R package to implement the data table back-end for 'dplyr'.
Questions tagged [dtplyr]
45 questions
17
votes
3 answers
What can't I do with dtplyr that I can in data.table
Should I invest my learning effort for data wrangling in R, specifically between dplyr, dtplyr and data.table?
I use dplyr mostly, but when the data is too big for that I will use data.table, which is a rare occurrence. So now that dtplyr v1.0 is…

dule arnaux
- 3,500
- 2
- 14
- 21
11
votes
3 answers
Non-equi join in tidyverse
I was wondering whether someone knows if the dplyr extension packages (dbplyr and dtplyr) allow non-equi joins within the usual dplyr workflow? I rarely need data.table, but fast non-equi joins are the only moments where I always need to setDT, then…

b_surial
- 512
- 4
- 14
8
votes
4 answers
Translating dplyr to data.table
So I am trying to translate some dplyr code. I have tried to get help from a package that translates dplyr to data.table but it still does not work. The error is with row_number from dplyr..
I need all the steps in the dplyr code (even though they…

xhr489
- 1,957
- 13
- 39
5
votes
1 answer
pivot_longer gives error when using dtplyr
I have a large dataset I'm trying to tidy using dtplyr. It consists of a large number (>1000) of date-value pairs for various locations. The original uses a pivot_longer, which works fine in dplyr, but gives an error in dtplyr. Is there a way to fix…

s_pike
- 1,710
- 1
- 10
- 22
3
votes
2 answers
R: Efficient iterative subsetting and filtering of large vector
I'd like to perform the following operation more quickly.
Logic: I have a vector big of 4 elements 1, 2, 3, 4. I also have a same-length vector of thresholds 1.1, 3.1, 4.1, 5.1. I want for each element to find the index of the first next element to…

gaut
- 5,771
- 1
- 14
- 45
2
votes
2 answers
data.table fill NA by custom function and other cells
Assume we have a data.table like:
library(data.table)
set.seed(123666)
dt <- data.table(
id = seq(1, 5),
sample1 = c(sample(c(NA, runif(2))), NA),
sample2 = c(NA, sample(c(NA, runif(3)))),
sample3 = c(sample(c(NA, runif(4)))) …

zhang
- 185
- 7
2
votes
2 answers
How to filter a data.table based on an uncertain number of conditions?
Given the following data.table in R:
set.seed(123666)
dt <- data.table(sample1 = sample(10),
sample2 = sample(10),
sample3 = sample(10),
sample4 = sample(10),
sample5 =…

zhang
- 185
- 7
2
votes
1 answer
How can I use data.table in a package without importing all functions?
I'm building an R package in which I would like to use dtplyr to perform various bits of data manipulation. My issue is that dtplyr seems to only work if I import the whole of data.table (i.e. using the roxygen #' @import data.table). Without this I…

wurli
- 2,314
- 10
- 17
2
votes
2 answers
use column from function input for group_by variable when using dtplyr
When trying to summarise columns by group using dtplyr, grouping seems to not be working. Since the group variable is an input of my function, I tried using group_by_ only to receive an error message.
Data:
df <- data.frame(
…

EML
- 615
- 4
- 14
2
votes
0 answers
Pasting data.table code from dtplyr results in slower code
I started migrating some of my code from dplyr to dtplyr today and as I was it dawned on me that it would be relatively simple to copy and paste data.table code from the "dtplyr_step_mutate" , "dtplyr_step" object produced from calling lazy_dt()…

mooboo
- 29
- 4
2
votes
3 answers
Select after a join with conflicting columns with dtplyr
If I run the following trivial example, I get the expected output:
library(dplyr)
library(dtplyr)
library(data.table)
dt1 <- lazy_dt(data.table(a = 1:5, b = 6:10))
dt2 <- lazy_dt(data.table(a = letters[1:5], b = 6:10))
dt1 %>%
left_join(
…

Wasabi
- 2,879
- 3
- 26
- 48
1
vote
0 answers
Applying dtplyr directly to a data.table instead of a lazy_dt
I want to perform several operations intertwining dtplyr and data.table code. My question is whether, having loaded dtplyr, I can apply dplyr verbs to a data.table object and get optimized data.table code as I would with a lazy_dt.
I here provide…

Alberto Agudo Dominguez
- 589
- 4
- 13
1
vote
1 answer
R dtplyr using mutated columns in functions
I have data in lazy_tbl and create columns which I would to use in future to calculate something else, but I'm missing something as getting errors. Here is example.
library(dtplyr)
library(dplyr)
library(implied)
#helper
possibly_mean <-…

Hakki
- 1,440
- 12
- 26
1
vote
0 answers
How to use rowwise with dtplyr
I have the following data frame:
df <- tibble(x = runif(6), y = runif(6), z = runif(6))
And for the operation, I'd like to do it has to use dplyr::rowwise().
library(dplyr)
df <- tibble(x = runif(6), y = runif(6), z = runif(6))
df %>%
rowwise()…

littleworth
- 4,781
- 6
- 42
- 76
1
vote
1 answer
Selecting and grouping multiple columns in dtplyr vs dplyr
I'd like to group_by across several variables in dtplyr within a lapply loop, and I find that I somehow can't use the same syntax as dplyr after calling lazy_dt().
library(dplyr)
mycolumns= c("Wind", "Month", "Ozone", "Solar.R")
columnpairs <-…

gaut
- 5,771
- 1
- 14
- 45