Convert multiple header columns of a large dataset into rows

Question

I've got the following dataframe (head shown here) and it's messy.

orig     dest   1997    2002      2006   2010     2016    1997    2002      2006   2010     2016 
Seoul   Inchon   543    524       364     452       845     543    524         364    452     845 
Seoul   Gyeongi  543    524       364     452       845     543    524         364    452     845
Inchon  Seoul    543    524       364     452       845     543    524         364    452     845

I want to convert my dataset into the following format (it has got 7000 observations and i have just shown the head of the dataset): I want to get (1997-2016 under one column named "cartrip") and (1997-2016 under another column named "walking")

Year orig  dest      cartrip  walking 
1997 Seoul Incheon   543      543 
2002 Seoul Incheon   524      524  
2006 Seoul Incheon   364      364
2010 Seoul Incheon   452      452
2016 Seoul Incheon   845      845
1997 Seoul Gyeongi   543      543 
2002 Seoul Gyeongi   524      524  
2006 Seoul Gyeongi   364      364
2010 Seoul Gyeongi   452      452
2016 Seoul Gyeongi   845      845

I tried to gather but it didn't work.

Does this answer your question? [Reshaping data.frame from wide to long format](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) — dc37, Apr 23 '20 at 06:25
There is a slight difference! You see there is empty row between org dest & names. — Xaviermoros, Apr 23 '20 at 06:28
Hi Abdirashid, I suspect the people that are answering you are having difficulty because they can't understand the structure of your data. It will be much easier to help if you provide a sample of your data with `dput(dataframe[1:10,])`. You can edit your question and paste the output. You can surround it with three backticks (```) for better formatting. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info. — Ian Campbell, Apr 23 '20 at 17:47

Ronak Shah · Answer 1 · 2020-04-23T08:29:09.783

You should not have duplicate column names in the dataframe, we correct that using make.unique.

names(df) <- make.unique(names(df))

We can then remove empty rows and get data in long format using pivot_longer.

library(dplyr)
library(tidyr)

df %>%
  filter(orig != '' | dest != '') %>%
  pivot_longer(cols = -c(orig, dest), 
               names_to = c('.value', 'index'), 
               names_sep = '\\.') %>%
  select(-index)

For the updated dataset we can use :

df %>%
  pivot_longer(cols = -c(orig, dest), names_to = 'year') %>%
  mutate(.copy = c('cartrip', 'walking')[.copy]) %>%
  pivot_wider(names_from = .copy, values_from = value)

#   orig   dest    year  cartrip walking
#   <fct>  <fct>   <chr>   <int>   <int>
# 1 Seoul  Inchon  1997      543     543
# 2 Seoul  Inchon  2002      524     524
# 3 Seoul  Inchon  2006      364     364
# 4 Seoul  Inchon  2010      452     452
# 5 Seoul  Inchon  2016      845     845
# 6 Seoul  Gyeongi 1997      543     543
# 7 Seoul  Gyeongi 2002      524     524
# 8 Seoul  Gyeongi 2006      364     364
# 9 Seoul  Gyeongi 2010      452     452
#10 Seoul  Gyeongi 2016      845     845
#11 Inchon Seoul   1997      543     543
#12 Inchon Seoul   2002      524     524
#13 Inchon Seoul   2006      364     364
#14 Inchon Seoul   2010      452     452
#15 Inchon Seoul   2016      845     845

Thank you so mcuh Ronak. I've tried but it has not worked the way i wanted. I have edited my question. I look forward to your input again. — Xaviermoros, Apr 23 '20 at 08:22

score 0 · Answer 2 · answered Apr 23 '20 at 07:14

a data.table solution. You might need to play around the year. As melt now in data.table cannot handle the year in your question correctly. I guess pivot_longer from tidyr can do this in one shot.

library(data.table)

df <- fread('orig   dest    cartrip cartrip cartrip cartrip cartrip walking walking walking walking walking
        1997    2002    2006    2010    2016    1997    2002    2006    2010    2016
Seoul   Inchon  543 524 364 452 845 543 524 364 452 845
Seoul   Gyeongi 543 524 364 452 845 543 524 364 452 845
Inchon  Seoul   543 524 364 452 845 543 524 364 452 845
')


result <- melt(df[orig!="",],measure.vars = patterns(walking="^walking",cartrip="^cartrip"),variable.name = "year")

result[,year:=forcats::lvls_revalue(year,c("1997", "2002", "2006", "2010", "2016")
)]

result[order(orig,dest)][,.(year,orig,dest,cartrip,walking)]
#>     year   orig    dest cartrip walking
#>  1: 1997 Inchon   Seoul     543     543
#>  2: 2002 Inchon   Seoul     524     524
#>  3: 2006 Inchon   Seoul     364     364
#>  4: 2010 Inchon   Seoul     452     452
#>  5: 2016 Inchon   Seoul     845     845
#>  6: 1997  Seoul Gyeongi     543     543
#>  7: 2002  Seoul Gyeongi     524     524
#>  8: 2006  Seoul Gyeongi     364     364
#>  9: 2010  Seoul Gyeongi     452     452
#> 10: 2016  Seoul Gyeongi     845     845
#> 11: 1997  Seoul  Inchon     543     543
#> 12: 2002  Seoul  Inchon     524     524
#> 13: 2006  Seoul  Inchon     364     364
#> 14: 2010  Seoul  Inchon     452     452
#> 15: 2016  Seoul  Inchon     845     845

^{Created on 2020-04-23 by the reprex package (v0.3.0)}

Thank you for the quick help. But it didn't work and i have edited my question again. I would greatly appreciate your timely help. — Xaviermoros, Apr 23 '20 at 08:21

Convert multiple header columns of a large dataset into rows

2 Answers2