0

I have a df where building is the id for each row. I would like to collapse by id and every value of that id to be stored in separate columns.

dat <- data.frame(building = c(1,1,2,3,4,2), 
                  name = c("Townshall", "Townshall", "George Way", "Main", "Celerybuilding", "George Way"), 
                  wing = c("A2", "A3", "A5", "B3","C5", "D1"), 
                  year_build = c("09-06-1820", "10-12-1920", "02-02-1927", "19-06-1921", "25-09-2002", "29-01-1980"))

  building           name wing year_build
1        1      Townshall   A2 09-06-1820
2        1      Townshall   A3 10-12-1920
3        2     George Way   A5 02-02-1927
4        3           Main   B3 19-06-1921
5        4 Celerybuilding   C5 25-09-2002
6        2     George Way   D1 29-01-1980

I would like to get (column name doesnt really matter):

  building           name wing wing.2 year_build year_build.2
1        1      Townshall   A2     A3 09-06-1820   10-12-1920
2        2     George Way   A5     D1 02-02-1927   29-01-1980
3        3           Main   B3   <NA> 19-06-1921         <NA>
4        4 Celerybuilding   C5   <NA> 25-09-2002         <NA>

Aggregate worked, but I cant seem to figure out how to get separate columns

aggregate(. ~ building, dat, paste, collapse = "; ")

  building                   name   wing             year_build
1        1   Townshall; Townshall A2; A3 09-06-1820; 10-12-1920
2        2 George Way; George Way A5; D1 02-02-1927; 29-01-1980
3        3                   Main     B3             19-06-1921
4        4         Celerybuilding     C5             25-09-2002

Does anybody have any suggestions?

Maya
  • 579
  • 3
  • 12

1 Answers1

4

You could create a new column to store the number of values in each building and cast the data in wide format. Using tidyverse this can be done as :

library(dplyr)
library(tidyr)

dat %>%
  group_by(building) %>%
  mutate(row = row_number()) %>%
  pivot_wider(names_from = row, values_from = c(wing, year_build))

#  building name           wing_1 wing_2 year_build_1 year_build_2
#     <dbl> <chr>          <chr>  <chr>  <chr>        <chr>       
#1        1 Townshall      A2     A3     09-06-1820   10-12-1920  
#2        2 George Way     A5     D1     02-02-1927   29-01-1980  
#3        3 Main           B3     NA     19-06-1921   NA          
#4        4 Celerybuilding C5     NA     25-09-2002   NA          

Or with data.table :

library(data.table)
dcast(setDT(dat), building+name~rowid(building),
      value.var = c('wing', 'year_build'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213