0

I have a data frame containing some repeated data for bikes I want to gather into one column. Columns are bikes, managers and company branches(countries).

   df <- data.frame("bike"=c("Harley-Davidson","Triumph","BMW","BMW","Triumph"),
    "branches"=c("USA","UK","GER","FRA","USA"),
    "manager"=c("Roy","Beth","Arnold","Arnold","Beth"))

>df

bike         branches   manager
Harley-Davidson  USA    Roy     
Triumph          UK     Beth    
BMW              GER    Arnold      
BMW              FRA    Arnold      
Triumph          USA    Beth

I want to gather the branches into one field this way:

bike           branches     manager
Harley-Davidson  USA        Roy
Triumph          UK, USA    Beth    
BMW              GER, FRA   Arnold  

The usual long-to-wide strategy is not working

Forge
  • 1,587
  • 1
  • 15
  • 36
  • 2
    `df %>% group_by(bike) %>% summarise(branches=toString(unique(branches)), manager=first(manager))` – A. Suliman Oct 12 '19 at 17:33
  • Hmmmm...other than reporting, I never understand the analytical usefulness of nested data in columns. – Parfait Oct 12 '19 at 18:00
  • @Parfait, I've run into some data structures/functions in bioinformatics that require unique genomic ranges. If you plan to use the information that would be nested later, you could join it, of course, but sometimes it's just as easy to pass it through as a string. – GenesRus Oct 13 '19 at 21:36

1 Answers1

1
df %>%
    group_by(bike) %>%
    summarise_all(function(x) toString(unique(x)))
## A tibble: 3 x 3
#  bike            branches manager
#  <fct>           <chr>    <chr>  
#1 BMW             GER, FRA Arnold 
#2 Harley-Davidson USA      Roy    
#3 Triumph         UK, USA  Beth   
d.b
  • 32,245
  • 6
  • 36
  • 77