Questions tagged [dtplyr]

An R package to implement the data table back-end for 'dplyr'.

45 questions
17
votes
3 answers

What can't I do with dtplyr that I can in data.table

Should I invest my learning effort for data wrangling in R, specifically between dplyr, dtplyr and data.table? I use dplyr mostly, but when the data is too big for that I will use data.table, which is a rare occurrence. So now that dtplyr v1.0 is…
dule arnaux
  • 3,500
  • 2
  • 14
  • 21
11
votes
3 answers

Non-equi join in tidyverse

I was wondering whether someone knows if the dplyr extension packages (dbplyr and dtplyr) allow non-equi joins within the usual dplyr workflow? I rarely need data.table, but fast non-equi joins are the only moments where I always need to setDT, then…
b_surial
  • 512
  • 4
  • 14
8
votes
4 answers

Translating dplyr to data.table

So I am trying to translate some dplyr code. I have tried to get help from a package that translates dplyr to data.table but it still does not work. The error is with row_number from dplyr.. I need all the steps in the dplyr code (even though they…
xhr489
  • 1,957
  • 13
  • 39
5
votes
1 answer

pivot_longer gives error when using dtplyr

I have a large dataset I'm trying to tidy using dtplyr. It consists of a large number (>1000) of date-value pairs for various locations. The original uses a pivot_longer, which works fine in dplyr, but gives an error in dtplyr. Is there a way to fix…
s_pike
  • 1,710
  • 1
  • 10
  • 22
3
votes
2 answers

R: Efficient iterative subsetting and filtering of large vector

I'd like to perform the following operation more quickly. Logic: I have a vector big of 4 elements 1, 2, 3, 4. I also have a same-length vector of thresholds 1.1, 3.1, 4.1, 5.1. I want for each element to find the index of the first next element to…
gaut
  • 5,771
  • 1
  • 14
  • 45
2
votes
2 answers

data.table fill NA by custom function and other cells

Assume we have a data.table like: library(data.table) set.seed(123666) dt <- data.table( id = seq(1, 5), sample1 = c(sample(c(NA, runif(2))), NA), sample2 = c(NA, sample(c(NA, runif(3)))), sample3 = c(sample(c(NA, runif(4)))) …
zhang
  • 185
  • 7
2
votes
2 answers

How to filter a data.table based on an uncertain number of conditions?

Given the following data.table in R: set.seed(123666) dt <- data.table(sample1 = sample(10), sample2 = sample(10), sample3 = sample(10), sample4 = sample(10), sample5 =…
zhang
  • 185
  • 7
2
votes
1 answer

How can I use data.table in a package without importing all functions?

I'm building an R package in which I would like to use dtplyr to perform various bits of data manipulation. My issue is that dtplyr seems to only work if I import the whole of data.table (i.e. using the roxygen #' @import data.table). Without this I…
wurli
  • 2,314
  • 10
  • 17
2
votes
2 answers

use column from function input for group_by variable when using dtplyr

When trying to summarise columns by group using dtplyr, grouping seems to not be working. Since the group variable is an input of my function, I tried using group_by_ only to receive an error message. Data: df <- data.frame( …
EML
  • 615
  • 4
  • 14
2
votes
0 answers

Pasting data.table code from dtplyr results in slower code

I started migrating some of my code from dplyr to dtplyr today and as I was it dawned on me that it would be relatively simple to copy and paste data.table code from the "dtplyr_step_mutate" , "dtplyr_step" object produced from calling lazy_dt()…
mooboo
  • 29
  • 4
2
votes
3 answers

Select after a join with conflicting columns with dtplyr

If I run the following trivial example, I get the expected output: library(dplyr) library(dtplyr) library(data.table) dt1 <- lazy_dt(data.table(a = 1:5, b = 6:10)) dt2 <- lazy_dt(data.table(a = letters[1:5], b = 6:10)) dt1 %>% left_join( …
Wasabi
  • 2,879
  • 3
  • 26
  • 48
1
vote
0 answers

Applying dtplyr directly to a data.table instead of a lazy_dt

I want to perform several operations intertwining dtplyr and data.table code. My question is whether, having loaded dtplyr, I can apply dplyr verbs to a data.table object and get optimized data.table code as I would with a lazy_dt. I here provide…
1
vote
1 answer

R dtplyr using mutated columns in functions

I have data in lazy_tbl and create columns which I would to use in future to calculate something else, but I'm missing something as getting errors. Here is example. library(dtplyr) library(dplyr) library(implied) #helper possibly_mean <-…
Hakki
  • 1,440
  • 12
  • 26
1
vote
0 answers

How to use rowwise with dtplyr

I have the following data frame: df <- tibble(x = runif(6), y = runif(6), z = runif(6)) And for the operation, I'd like to do it has to use dplyr::rowwise(). library(dplyr) df <- tibble(x = runif(6), y = runif(6), z = runif(6)) df %>% rowwise()…
littleworth
  • 4,781
  • 6
  • 42
  • 76
1
vote
1 answer

Selecting and grouping multiple columns in dtplyr vs dplyr

I'd like to group_by across several variables in dtplyr within a lapply loop, and I find that I somehow can't use the same syntax as dplyr after calling lazy_dt(). library(dplyr) mycolumns= c("Wind", "Month", "Ozone", "Solar.R") columnpairs <-…
gaut
  • 5,771
  • 1
  • 14
  • 45
1
2 3