5

I have a large dataset I'm trying to tidy using dtplyr. It consists of a large number (>1000) of date-value pairs for various locations. The original uses a pivot_longer, which works fine in dplyr, but gives an error in dtplyr. Is there a way to fix this, maintaining the performance benefits of dtplyr?

This works

library(tidyverse)
library(dtplyr)
library(data.table)

my_data_tb <- tribble(
  ~`date-A`, ~`value-A`, ~`date-B`, ~`value-B`,
  "date1", 1, "date2", 2,
  "date2", 1, "date3", 2 
)

my_data_tb %>% 
  pivot_longer(
  cols = everything(),
  names_to = c(".value", "grid_square"),
  names_sep = "-"
)

but this gives the error:

my_data_dt <- as.data.table(my_data_tb)
my_data_dt <- lazy_dt(my_data_dt)

my_data_dt %>%
  pivot_longer(
    cols = everything(),
    names_to = c(".value", "grid_square"),
    names_sep = "-"
  )

The error message is:

Error: Can't subset elements that don't exist.
x The locations 1 and 2 don't exist.
i There are only 0 elements.
Run rlang::last_error() to see where the error occurred.
In addition: Warning message:
Expected 2 pieces. Missing pieces filled with NA in 7 rows [1, 2, 3, 4, 5, 6, 7].
rlang::last_error()
Error: Internal error: Trace data is not square.

UPDATE - it now gives this error message:

Error in UseMethod("pivot_longer") : no applicable method for 'pivot_longer' applied to an object of class "c('dtplyr_step_first', 'dtplyr_step')"

As an aside, this also works, but I think it loses the dtplyr performance gain:

my_data_dt %>%
  as_tibble() %>%
  pivot_longer(
    cols = everything(),
    names_to = c(".value", "grid_square"),
    names_sep = "-"
  )
s_pike
  • 1,710
  • 1
  • 10
  • 22
  • 1
    Just a thought: `pivot_longer` being a `tidyr` verb, not a `dplyr` one, that could explain why it's not been translated. – Aurèle Apr 08 '20 at 12:47
  • Ok, thank you, I hadn't realised that! In which case I probably need a `data.table` solution. From the looks of this: [name-variables-during-multiple-melt-with-data-table](https://stackoverflow.com/questions/57435780/name-variables-during-multiple-melt-with-data-table) there isn't a perfect answer - that points back to `pivot_longer` which doesn't work in this situation! – s_pike Apr 08 '20 at 13:24
  • This should get you close: `melt( as.data.table(my_data_tb), variable.name = "grid_square", measure.vars = patterns(date = "^date-", value = "^value-") )`, but yes, there is the open issue of `A`, `B` being lost to level indices, though it's an easy fix a posteriori – Aurèle Apr 08 '20 at 16:21
  • 1
    Thanks, that's great. For completeness I have: `#melt` `my_data_dt <- my_data_dt %>%` `as.data.table() %>% ` `melt.data.table(measure = patterns("^date-", "^value-"), value.name = c("date", "value"))` `# Fix variable names (conveniently I already have a list of variable names from an ` `# earlier step in my script` `v_names <- c("A", "B")` `my_data_dt$variable <- v_names[my_data_dt$variable]` – s_pike Apr 09 '20 at 09:32
  • 1
    You may also have a look at: https://github.com/markfairbanks/tidytable/ and https://github.com/TysonStanley/tidyfast (not tested myself), they attempt to close that tidyr - data.table gap – Aurèle Apr 09 '20 at 09:48
  • There is now an open issue to add `tidyr` methods to `dtplyr`, which will include `pivot_longer()` – s_pike Apr 19 '21 at 08:27
  • 1
    Link to open issue: https://github.com/tidyverse/dtplyr/issues/168 – s_pike Jun 23 '21 at 09:06
  • 1
    `pivot_longer()` is now included in the development version of `dtplyr`, which can be installed using `devtools::install_github("tidyverse/dtplyr")`. This should work when [v1.2.0 is released](https://github.com/tidyverse/dtplyr/issues/302) (I haven't tested the development version on this issue). – s_pike Nov 19 '21 at 12:02

1 Answers1

2

Dtplyr version 1.2.0 is now available on CRAN, which means this issue is now resolved!

For anyone experiencing this error, check/update your version of dtplyr to ensure you are running >=1.2.0:

install.packages("dtplyr")

(NB. this isn't updated as part of the tidyverse packages so make sure to do it separately)

https://www.tidyverse.org/blog/2021/12/dtplyr-1-2-0/

https://cran.r-project.org/web/packages/dtplyr/index.html

s_pike
  • 1,710
  • 1
  • 10
  • 22