1

I have a wide format data frame abc.csv has variable ID, pc_2007-pc_2011 (value in these are postcode during different year) and rd_2007-rd_2011 (values in these are reveiw date for each year.

ID pc_2007 pc_2008 pc_2009 pc_2010 pc_2011 rd_2007 rd_2008 rd_2009 rd_2010 rd_2011
A 1 4 7 10 13 16 19 22 25 28
B 2 5 8 11 14 17 20 23 26 29
C 3 6 9 12 15 18 21 24 27 30

i want to convert this dataframe into long format

ID year pc rd
A 2007 1 16
A 2008 4 19
A 2009 7 22
user438383
  • 5,716
  • 8
  • 28
  • 43
skpak
  • 79
  • 6
  • 3
    Does this answer your question? [Reshaping multiple sets of measurement columns (wide format) into single columns (long format)](https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns) – ktiu Jun 28 '21 at 09:32

3 Answers3

2

You can use names_sep in pivot_longer

df2 <- tidyr::pivot_longer(df1, 
                    cols = -ID, 
                    names_to = c('.value', 'year'), 
                    names_sep = '_')

df2
#   ID    year     pc    rd
#   <chr> <chr> <int> <int>
# 1 A     2007      1    16
# 2 A     2008      4    19
# 3 A     2009      7    22
# 4 A     2010     10    25
# 5 A     2011     13    28
# 6 B     2007      2    17
# 7 B     2008      5    20
# 8 B     2009      8    23
# 9 B     2010     11    26
#10 B     2011     14    29
#11 C     2007      3    18
#12 C     2008      6    21
#13 C     2009      9    24
#14 C     2010     12    27
#15 C     2011     15    30
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Please see a suggestion below

    # your data
df <- structure(list(ID = c("A", "B", "C"), pc_2007 = 1:3, pc_2008 = 4:6, 
                     pc_2009 = 7:9, pc_2010 = 10:12, pc_2011 = 13:15, rd_2007 = 16:18, 
                     rd_2008 = 19:21, rd_2009 = 22:24, rd_2010 = 25:27, rd_2011 = 28:30), class = "data.frame", row.names = c(NA, 
                                                                                                                              -3L))
# packages needed
library(dplyr)
library(tidyr)
library(stringr)

# suggestion
df %>% 
  # your columns names are difficult to work with, I propose you use a "transition" table and use 
  # regular expressions to select elements you need ...
  pivot_longer(cols = 2:last_col(), names_to = "year_code", values_to = "value") %>% 
  mutate(year = str_extract(year_code, "[0-9]{4}$"),
         code = str_extract(year_code, "^[a-z]{2}")) %>% 
  select(-year_code) %>% 
  # ...and then pivot your table back
  pivot_wider(names_from = code, values_from = value)

Output:

 ID    year     pc    rd
   <chr> <chr> <int> <int>
 1 A     2007      1    16
 2 A     2008      4    19
 3 A     2009      7    22
 4 A     2010     10    25
 5 A     2011     13    28
 6 B     2007      2    17
 7 B     2008      5    20
 8 B     2009      8    23
 9 B     2010     11    26
10 B     2011     14    29
11 C     2007      3    18
12 C     2008      6    21
13 C     2009      9    24
14 C     2010     12    27
15 C     2011     15    30
Paul
  • 2,850
  • 1
  • 12
  • 37
0

Here is tidyverse a way with pivot_longer followed by pivot_wider.

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(
    cols = -ID,
    names_to = c("name", "year"),
    names_sep = "_"
  ) %>%
  pivot_wider(
    id_cols = c(ID, year),
    names_from = name,
    values_from = value
  )
## A tibble: 15 x 4
#   ID    year     pc    rd
#   <chr> <chr> <int> <int>
# 1 A     2007      1    16
# 2 A     2008      4    19
# 3 A     2009      7    22
# 4 A     2010     10    25
# 5 A     2011     13    28
# 6 B     2007      2    17
# 7 B     2008      5    20
# 8 B     2009      8    23
# 9 B     2010     11    26
#10 B     2011     14    29
#11 C     2007      3    18
#12 C     2008      6    21
#13 C     2009      9    24
#14 C     2010     12    27
#15 C     2011     15    30

Data in dput format

df1 <- 
structure(list(ID = c("A", "B", "C"), pc_2007 = 1:3, pc_2008 = 4:6, 
    pc_2009 = 7:9, pc_2010 = 10:12, pc_2011 = 13:15, rd_2007 = 16:18, 
    rd_2008 = 19:21, rd_2009 = 22:24, rd_2010 = 25:27, rd_2011 = 28:30), 
    class = "data.frame", row.names = c(NA, -3L))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • thanks Rui for your prompt response and code, really helpful. i do apologies for column names. what do you suggest for naming column name? – skpak Jun 28 '21 at 10:41
  • @saleem I don't understand, the column names come from the data itself, why are they wrong? – Rui Barradas Jun 28 '21 at 10:43
  • i made those variable names. i have data on postcode for 2007 to 2017 period, so i called pc_2007, pc_2008, pc_2009 and so on. similarly i have data on annual review date for same ids, so i called rd_2007,rd_2008,rd_2009 and so on – skpak Jun 28 '21 at 11:00
  • @saleem I believe that those names are good, they automatically become the column `year` in the long format. – Rui Barradas Jun 28 '21 at 11:33
  • @ Rui Barrads thanks – skpak Jun 28 '21 at 12:16