1

Though there is an abundance of 'wide to long' threads for R, I haven't found an answer that will help me with my issue. Any assistance is greatly appreciated!

Example of my dataframe (in wide format):

CODE  NAME  M_2010_1  M_2011_1  M_2012_1  M_2010_3  M_2011_3  M_2012_3
  1     A      10        11        10        9         10       13
  12    B      11        13        15        15        14       11
  8     C       9         2         4        2         8         8

Desired dataframe (in long):

CODE  NAME  YEAR  M1  M3
  1    A    2010  10  9
  1    A    2011  11  10
  1    A    2012  10  13
 12    B    2010  11  15
 12    B    2011  13  14
 12    B    2012  15  11
  8    C    2010   9   2
  8    C    2011   2   8
  8    C    2012   4   8

Thanks in advance!

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Tom H
  • 91
  • 5
  • These posts (https://stackoverflow.com/q/12466493/5325862, https://stackoverflow.com/q/57533341/5325862, https://stackoverflow.com/q/23945350/5325862) should get you most or all of the way there. The only part that might differ in your case is the structure of the column names—you're trying to extract information from multiple places – camille Dec 12 '22 at 16:37

3 Answers3

2

Data

df<-
structure(list(CODE = c(1L, 12L, 8L), NAME = c("A", "B", "C"), 
    M_2010_1 = c(10L, 11L, 9L), M_2011_1 = c(11L, 13L, 2L), M_2012_1 = c(10L, 
    15L, 4L), M_2010_3 = c(9L, 15L, 2L), M_2011_3 = c(10L, 14L, 
    8L), M_2012_3 = c(13L, 11L, 8L)), class = "data.frame", row.names = c(NA, 
-3L))

Code

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(cols = -c(CODE,NAME),names_sep = "_",names_to = c("aux1","YEAR","aux2")) %>% 
  unite(aux,aux1,aux2,sep = "") %>% 
  pivot_wider(names_from = aux,values_from = value)

Output

# A tibble: 9 x 5
   CODE NAME  YEAR     M1    M3
  <int> <chr> <chr> <int> <int>
1     1 A     2010     10     9
2     1 A     2011     11    10
3     1 A     2012     10    13
4    12 B     2010     11    15
5    12 B     2011     13    14
6    12 B     2012     15    11
7     8 C     2010      9     2
8     8 C     2011      2     8
9     8 C     2012      4     8
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
  • 1
    You can shorten this by a couple lines while still using the same strategy: use `names_sep` inside `pivot_longer` and you won't need a call to `separate`, and replace `mutate` and `select` with a single `unite` call – camille Dec 12 '22 at 17:59
  • Nice improvments @camille, I updated my code! – Vinícius Félix Dec 12 '22 at 18:09
1

A one liner using reshape which allows to define all in one.

reshape(dat, idv=1:2, var=list(3:5, 6:8), dir='long', timev='YEAR', times=2010:2012, v.n=c('M1', 'M2'))
#           CODE NAME YEAR M1 M2
# 1.A.2010     1    A 2010 10  9
# 12.B.2010   12    B 2010 11 15
# 8.C.2010     8    C 2010  9  2
# 1.A.2011     1    A 2011 11 10
# 12.B.2011   12    B 2011 13 14
# 8.C.2011     8    C 2011  2  8
# 1.A.2012     1    A 2012 10 13
# 12.B.2012   12    B 2012 15 11
# 8.C.2012     8    C 2012  4  8

Data:

dat <- structure(list(CODE = c(1L, 12L, 8L), NAME = c("A", "B", "C"), 
    M_2010_1 = c(10L, 11L, 9L), M_2011_1 = c(11L, 13L, 2L), M_2012_1 = c(10L, 
    15L, 4L), M_2010_3 = c(9L, 15L, 2L), M_2011_3 = c(10L, 14L, 
    8L), M_2012_3 = c(13L, 11L, 8L)), class = "data.frame", row.names = c(NA, 
-3L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

We could do this in pivot_longer after we rearrange the substring in the column names

library(dplyr)
library(stringr)
library(tidyr)
df1 %>% 
  rename_with(~ str_replace(.x, "_(\\d+)_(\\d+)", "\\2_\\1"), 
     starts_with("M_")) %>% 
  pivot_longer(cols = starts_with("M"), 
     names_to = c(".value", "year"), names_sep = "_")

-output

# A tibble: 9 × 5
   CODE NAME  year     M1    M3
  <int> <chr> <chr> <int> <int>
1     1 A     2010     10     9
2     1 A     2011     11    10
3     1 A     2012     10    13
4    12 B     2010     11    15
5    12 B     2011     13    14
6    12 B     2012     15    11
7     8 C     2010      9     2
8     8 C     2011      2     8
9     8 C     2012      4     8

data

df1 <- structure(list(CODE = c(1L, 12L, 8L), NAME = c("A", "B", "C"), 
    M_2010_1 = c(10L, 11L, 9L), M_2011_1 = c(11L, 13L, 2L), M_2012_1 = c(10L, 
    15L, 4L), M_2010_3 = c(9L, 15L, 2L), M_2011_3 = c(10L, 14L, 
    8L), M_2012_3 = c(13L, 11L, 8L)), class = "data.frame", row.names = c(NA, 
-3L))
akrun
  • 874,273
  • 37
  • 540
  • 662