converting data wide format data frame to long format in data frame in r

Question

I have a wide format data frame abc.csv has variable ID, pc_2007-pc_2011 (value in these are postcode during different year) and rd_2007-rd_2011 (values in these are reveiw date for each year.

ID	pc_2007	pc_2008	pc_2009	pc_2010	pc_2011	rd_2007	rd_2008	rd_2009	rd_2010	rd_2011
A	1	4	7	10	13	16	19	22	25	28
B	2	5	8	11	14	17	20	23	26	29
C	3	6	9	12	15	18	21	24	27	30

i want to convert this dataframe into long format

ID	year	pc	rd
A	2007	1	16
A	2008	4	19
A	2009	7	22

Does this answer your question? [Reshaping multiple sets of measurement columns (wide format) into single columns (long format)](https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns) — ktiu, Jun 28 '21 at 09:32

score 2 · Answer 1 · answered Jun 28 '21 at 10:33

You can use names_sep in pivot_longer

df2 <- tidyr::pivot_longer(df1, 
                    cols = -ID, 
                    names_to = c('.value', 'year'), 
                    names_sep = '_')

df2
#   ID    year     pc    rd
#   <chr> <chr> <int> <int>
# 1 A     2007      1    16
# 2 A     2008      4    19
# 3 A     2009      7    22
# 4 A     2010     10    25
# 5 A     2011     13    28
# 6 B     2007      2    17
# 7 B     2008      5    20
# 8 B     2009      8    23
# 9 B     2010     11    26
#10 B     2011     14    29
#11 C     2007      3    18
#12 C     2008      6    21
#13 C     2009      9    24
#14 C     2010     12    27
#15 C     2011     15    30

Paul · Answer 2 · 2021-06-28T09:57:29.500

Please see a suggestion below

    # your data
df <- structure(list(ID = c("A", "B", "C"), pc_2007 = 1:3, pc_2008 = 4:6, 
                     pc_2009 = 7:9, pc_2010 = 10:12, pc_2011 = 13:15, rd_2007 = 16:18, 
                     rd_2008 = 19:21, rd_2009 = 22:24, rd_2010 = 25:27, rd_2011 = 28:30), class = "data.frame", row.names = c(NA, 
                                                                                                                              -3L))
# packages needed
library(dplyr)
library(tidyr)
library(stringr)

# suggestion
df %>% 
  # your columns names are difficult to work with, I propose you use a "transition" table and use 
  # regular expressions to select elements you need ...
  pivot_longer(cols = 2:last_col(), names_to = "year_code", values_to = "value") %>% 
  mutate(year = str_extract(year_code, "[0-9]{4}$"),
         code = str_extract(year_code, "^[a-z]{2}")) %>% 
  select(-year_code) %>% 
  # ...and then pivot your table back
  pivot_wider(names_from = code, values_from = value)

Output:

 ID    year     pc    rd
   <chr> <chr> <int> <int>
 1 A     2007      1    16
 2 A     2008      4    19
 3 A     2009      7    22
 4 A     2010     10    25
 5 A     2011     13    28
 6 B     2007      2    17
 7 B     2008      5    20
 8 B     2009      8    23
 9 B     2010     11    26
10 B     2011     14    29
11 C     2007      3    18
12 C     2008      6    21
13 C     2009      9    24
14 C     2010     12    27
15 C     2011     15    30

score 0 · Answer 3 · answered Jun 28 '21 at 09:53

0

Here is tidyverse a way with pivot_longer followed by pivot_wider.

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(
    cols = -ID,
    names_to = c("name", "year"),
    names_sep = "_"
  ) %>%
  pivot_wider(
    id_cols = c(ID, year),
    names_from = name,
    values_from = value
  )
## A tibble: 15 x 4
#   ID    year     pc    rd
#   <chr> <chr> <int> <int>
# 1 A     2007      1    16
# 2 A     2008      4    19
# 3 A     2009      7    22
# 4 A     2010     10    25
# 5 A     2011     13    28
# 6 B     2007      2    17
# 7 B     2008      5    20
# 8 B     2009      8    23
# 9 B     2010     11    26
#10 B     2011     14    29
#11 C     2007      3    18
#12 C     2008      6    21
#13 C     2009      9    24
#14 C     2010     12    27
#15 C     2011     15    30

Data in `dput` format

df1 <- 
structure(list(ID = c("A", "B", "C"), pc_2007 = 1:3, pc_2008 = 4:6, 
    pc_2009 = 7:9, pc_2010 = 10:12, pc_2011 = 13:15, rd_2007 = 16:18, 
    rd_2008 = 19:21, rd_2009 = 22:24, rd_2010 = 25:27, rd_2011 = 28:30), 
    class = "data.frame", row.names = c(NA, -3L))

answered Jun 28 '21 at 09:53

Rui Barradas

70,273
8
34
66

thanks Rui for your prompt response and code, really helpful. i do apologies for column names. what do you suggest for naming column name? – skpak Jun 28 '21 at 10:41
@saleem I don't understand, the column names come from the data itself, why are they wrong? – Rui Barradas Jun 28 '21 at 10:43
i made those variable names. i have data on postcode for 2007 to 2017 period, so i called pc_2007, pc_2008, pc_2009 and so on. similarly i have data on annual review date for same ids, so i called rd_2007,rd_2008,rd_2009 and so on – skpak Jun 28 '21 at 11:00
@saleem I believe that those names are good, they automatically become the column `year` in the long format. – Rui Barradas Jun 28 '21 at 11:33
@ Rui Barrads thanks – skpak Jun 28 '21 at 12:16

converting data wide format data frame to long format in data frame in r

3 Answers3

Data in dput format

Data in `dput` format