1

I have a Dataframe that looks roughly like this:

my_data <- tribble(
  ~Main, ~VAR1a, ~VAR1b, ~VAR1c, ~VAR2a, ~VAR2b, ~VAR2c,
  "A", "B", "C", "D", 1, 1, 1,
  "B", "A", "D", "", 1, 2, NA,
  "C", "D", "A", "", 2, 1, NA
)
# A tibble: 3 x 7
  Main  VAR1a VAR1b VAR1c VAR2a VAR2b VAR2c
  <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 A     B     C     "D"       1     1     1
2 B     A     D     ""        1     2    NA
3 C     D     A     ""        2     1    NA

I want to pivot this data frame longer based on the VAR1a, VAR1b, and VAR1c. Which is quite easy to do:

my_data %>%
  pivot_longer(names_to = c("VAR1"),
               cols = c("VAR1a", "VAR1b", "VAR1c"))
# A tibble: 9 x 6
  Main  VAR2a VAR2b VAR2c VAR1  value
  <chr> <dbl> <dbl> <dbl> <chr> <chr>
1 A         1     1     1 VAR1a "B"  
2 A         1     1     1 VAR1b "C"  
3 A         1     1     1 VAR1c "D"  
4 B         1     2    NA VAR1a "A"  
5 B         1     2    NA VAR1b "D"  
6 B         1     2    NA VAR1c ""   
7 C         2     1    NA VAR1a "D"  
8 C         2     1    NA VAR1b "A"  
9 C         2     1    NA VAR1c ""  

My problem is, however, that the columns VAR2a, VAR2b, and VAR2c are directly related to VAR1a, VAR1b, and VAR1c (so, the letter after the variable number shows association). For example, VAR2a is the value that Main has assigned to VAR1a and so on.

Ideally I would like to have something that looks like this:

# A tibble: 9 x 4
  Main  value VAR1   VAR2
  <chr> <chr> <chr> <dbl>
1 A     "B"   VAR1a     1
2 A     "C"   VAR1b     1
3 A     "D"   VAR1c     1
4 B     "A"   VAR1a     1
5 B     "D"   VAR1b     2
6 B     ""    VAR1c    NA
7 C     "D"   VAR1a     2
8 C     "A"   VAR1b     1
9 C     ""    VAR1c    NA

I can't figure out how to do this with pivot_longer() or whether it is even possible to do with tidyverse.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
Zlo
  • 1,150
  • 2
  • 18
  • 38

1 Answers1

3

You can try pivot_longer as -

tidyr::pivot_longer(my_data, 
             cols = starts_with('VAR'), 
             names_to = '.value', 
             names_pattern = '(VAR\\d+)')

#  Main  VAR1   VAR2
#  <chr> <chr> <dbl>
#1 A     "B"       1
#2 A     "C"       1
#3 A     "D"       1
#4 B     "A"       1
#5 B     "D"       2
#6 B     ""       NA
#7 C     "D"       2
#8 C     "A"       1
#9 C     ""       NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213