0

Im trying to wrap my hear around restructuring operations with dplyr but can't solve this one, maybe one of you can help :)

    df <- data.frame(
  gene = c("ABC", "ABC", "AA", "AB", "AC", "DD", "DE", "AA", "AR", "ABC"),
  genotype = c("ht", "cpht", "ht", "cpht", "hm", "hm", "cpht", "ht", "hm", "cpht"),
  consequence = c("utr3", "miss", "miss", "stop", "utr5", "miss", "stop", "miss", "utr3", "utr5")
)

I would like to create a new df that should look like this: df I want

Supposedly this should be easily done with dplyr but I cant get it to work. Maybe one of you can?

Thanks a lot! Sebastian

Sebastian Hesse
  • 542
  • 4
  • 16

1 Answers1

2

You could try this :

df %>% 
  group_by(gene,genotype) %>%
  summarise(consequence=paste(consequence,collapse=",")) %>%
  spread(genotype,consequence)

## A tibble: 7 x 4
## Groups:   gene [7]  
#  gene  cpht      hm    ht       
#  <fct> <chr>     <chr> <chr>    
#1 AA    <NA>      <NA>  miss,miss
#2 AB    stop      <NA>  <NA>     
#3 ABC   miss,utr5 <NA>  utr3     
#4 AC    <NA>      utr5  <NA>     
#5 AR    <NA>      utr3  <NA>     
#6 DD    <NA>      miss  <NA>     
#7 DE    stop      <NA>  <NA>

Your data, as given in your post :

  df <- data.frame(
  gene = c("ABC", "ABC", "AA", "AB", "AC", "DD", "DE", "AA", "AR", "ABC"),
  genotype = c("ht", "cpht", "ht", "cpht", "hm", "hm", "cpht", "ht", "hm", "cpht"),
  consequence = c("utr3", "miss", "miss", "stop", "utr5", "miss", "stop", "miss", "utr3", "utr5")
 )
 df
#   gene genotype consequence
#1   ABC       ht        utr3
#2   ABC     cpht        miss
#3    AA       ht        miss
#4    AB     cpht        stop
#5    AC       hm        utr5
#6    DD       hm        miss
#7    DE     cpht        stop
#8    AA       ht        miss
#9    AR       hm        utr3
#10  ABC     cpht        utr5
Nicolas2
  • 2,170
  • 1
  • 6
  • 15