2

I am trying a pivot_longer with multiple variable sets and I'm having trouble getting the syntax right from examples.

My dummy dataset is:

library(dplyr)
library(tidyr)

ID =  c("id-1", "id-2", "id-3")
State = c("MD", "MD", "VA")
Time1Day= c( 1, 12, 30)
Time1Month = c( 1, 4, 5)
Time2Day = c( 9, 21, 13)
Time2Month = c( 12, 4, 5)
Time3Day = c( 7, 14, NA)
Time3Month = c( 1, 2, NA)


df <-data.frame(ID, State, Time1Day, Time1Month, Time2Day, Time2Month, Time3Day, Time3Month)

My desired outcome is:

    ID State  Time Day Month
1 id-1    MD Time1   1     1
2 id-1    MD Time2   9    12
3 id-1    MD Time3   7     1
4 id-2    MD Time1  12     4
5 id-2    MD Time2  21     4
6 id-2    MD Time3  14     2
7 id-3    VA Time1  30     5
8 id-3    VA Time2  13     5

I have looked here and here to try to get the syntax right, and tried the following two solutions, which I cannot get to work:

df.long <- df %>% 
  pivot_longer(cols = starts_with("Time"), names_to = c("Day", "Month"), names_sep="(?=[0-9])"), values_to = "Time", values_drop_na = TRUE)

df.long <- df %>% 
  pivot_longer(cols = ends_with("Day"), names_to = c("Time"), values_to = "Days", values_drop_na = TRUE) %>% 
  pivot_longer(cols = ends_with("Month"), names_to = c("Time"), values_to = "Months", values_drop_na = TRUE)

Any advice on what I am missing and how to fix it would be greatly appreciated

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
tchoup
  • 971
  • 4
  • 11
  • 1
    Does this answer your question? [Reshaping Wide to Long but not reshaping all variables on R](https://stackoverflow.com/questions/68696466/reshaping-wide-to-long-but-not-reshaping-all-variables-on-r) – Martin Gal Nov 01 '21 at 15:20

2 Answers2

4

Edit Added values_drop_na = TRUE thanks to TarJae's comment.

You could use

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-c(ID, State), 
               names_to = c("Time", ".value"),
               names_pattern = "(Time\\d)(.*)",
               values_drop_na = TRUE)

This returns

# A tibble: 9 x 5
  ID    State Time    Day Month
  <chr> <chr> <chr> <dbl> <dbl>
1 id-1  MD    Time1     1     1
2 id-1  MD    Time2     9    12
3 id-1  MD    Time3     7     1
4 id-2  MD    Time1    12     4
5 id-2  MD    Time2    21     4
6 id-2  MD    Time3    14     2
7 id-3  VA    Time1    30     5
8 id-3  VA    Time2    13     5
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • 2
    You could add `values_drop_na = TRUE` then it is exactly the same output! – TarJae Nov 01 '21 at 15:34
  • Thank you, this works like a charm. Quick follow up: what does the .value represent? Is it just saying that Time1, Time2, and Time3 should all fall under the Time heading? – tchoup Nov 01 '21 at 16:25
  • 1
    The `.value` argument takes the matched part of `names_pattern` (in this case: `(.*)`) and uses it as a new column name. So `Time1Day` is split into `Time1` and `Day`, `Time1Month` into `Time1` and `Month` and `pivot_longer` knows where to put the values. – Martin Gal Nov 01 '21 at 16:42
  • @MartinGal is there a way to memorize how this names pattern stuff works. Any tricks? One day I can apply the next day it is complete new to me?! – TarJae Nov 01 '21 at 17:01
  • 1
    @TarJae I'm still struggling with this problem, too. :-( – Martin Gal Nov 01 '21 at 17:15
3

a data.table approach

library(data.table)
# melt to long
DT <- melt(setDT(df), id.vars = c("ID", "State"), variable.factor = FALSE, na.rm = TRUE)
# split variable string
DT[, c("Time", "part2") := tstrsplit(variable, "(?<=[0-9])", perl=TRUE)]
# recast to wide
dcast(DT, ID + State + Time ~ part2, value.var = "value", drop = TRUE)
#      ID State  Time Day Month
# 1: id-1    MD Time1   1     1
# 2: id-1    MD Time2   9    12
# 3: id-1    MD Time3   7     1
# 4: id-2    MD Time1  12     4
# 5: id-2    MD Time2  21     4
# 6: id-2    MD Time3  14     2
# 7: id-3    VA Time1  30     5
# 8: id-3    VA Time2  13     5
Wimpel
  • 26,031
  • 1
  • 20
  • 37