0

I'm trying to transpose a wide dataset to a long tidy one. I use the tidyr::gather() function alot for these kind of tasks, only now I have a pretty weird dataset.

The following is a small version of mine. As you can imagine that the columns with __1 behind them get repeated till number __16 or something in my real datframe. Is this possible to fix with tidyr or dplyr tools?

# A tibble: 1 x 10
   code city      party_short party_long           votes seats party_short__1 party_long__1    votes__1 seats__1
  <dbl> <chr>     <chr>       <chr>                <dbl> <dbl> <chr>          <chr>               <dbl>    <dbl>
1  3630 Amsterdam PVDA        Partij van de Arbeid  1833  5.00 HARLBEL        Harlinger Belang       942     2.00 

And for reproducability:

library(tidyverse)

df <- tibble(code = 3630,
        city = "Amsterdam",
        party_short = "PVDA",
        party_long = "Partij van de Arbeid",
        votes = 1833,
        seats = 5,
        party_short__1 = "HARLBEL",
        party_long__1 = "Harlinger Belang",
        votes__1 = 942,
        seats__1 = 2)

With a desired output:

# A tibble: 2 x 6
   code city      party_short party_long           votes seats
  <dbl> <chr>     <chr>       <chr>                <dbl> <dbl>
1  3630 Amsterdam PVDA        Partij van de Arbeid  1833  5.00
2  3630 Amsterdam HARLBEL     Harlinger Belang       942  2.00
Tdebeus
  • 1,519
  • 5
  • 21
  • 43
  • Just to be sure; columns `code` and `city` does not have other columns with `__#`? (i.e. `code` and `city` stay the same for all rows in the desired output) – Shique Apr 10 '18 at 12:18
  • Yes, only `party_short`, `party_long`, `votes` and `seats` are repeated (but with different values of course) – Tdebeus Apr 10 '18 at 12:20
  • Does this answer your question? [Reshaping multiple sets of measurement columns (wide format) into single columns (long format)](https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns) – camille Jan 08 '20 at 04:09

2 Answers2

1

We can gather all columns, separate the column based on "__", and then spread the data frame.

library(tidyverse)

df2 <- df %>%
  gather(Column, Value, -code, -city) %>%
  separate(Column, into = c("Column", "Number"), sep = "__") %>%
  spread(Column, Value) %>%
  select(-Number)
df2
# # A tibble: 2 x 6
#   code city      party_long           party_short seats votes
#   <dbl> <chr>     <chr>                <chr>       <chr> <chr>
# 1 3630. Amsterdam Harlinger Belang     HARLBEL     2     942  
# 2 3630. Amsterdam Partij van de Arbeid PVDA        5     1833
www
  • 38,575
  • 12
  • 48
  • 84
0

I am using a combination of data.table and tidyr below

library(data.table)
library(tidyr)

setDT(df)

melt(df, id.vars = c('code', 'city')) %>% separate(variable, c('vv', 'bb'), '__') %>%
  dcast(code + city + bb ~ vv, value.var = 'value') %>% mutate(bb = NULL)

  code      city           party_long party_short seats votes
1 3630 Amsterdam     Harlinger Belang     HARLBEL     2   942
2 3630 Amsterdam Partij van de Arbeid        PVDA     5  1833
Felipe Alvarenga
  • 2,572
  • 1
  • 17
  • 36