1

I have a dataset that looks like the following

Invoice Pizza Pasta Soda Cake  
 1        NA  pasta  NA   NA    
 1        NA   NA    NA  cake    
 2      pizza  NA    NA   NA    
 2        NA  pasta  NA   NA

I want to group it by Invoice and get an output as under

Invoice Pizza Pasta Soda Cake  
 1        NA  pasta  NA  cake   
 2     pizza  pasta  NA   NA

I'm trying to use the group_by(Invoice) %>% summarize() feature of dplyr but unable to get the desired output. Kindly suggest a good method, thanks!

ANP
  • 51
  • 1
  • 9
  • 1
    Is there always only one non `NA` value per group in every column? – LAP Feb 13 '19 at 07:38
  • 4
    See https://stackoverflow.com/questions/28036294/collapsing-rows-where-some-are-all-na-others-are-disjoint-with-some-nas – Cyrus Mohammadian Feb 13 '19 at 07:41
  • @ LAP yes there is only one value other than NA. the column name is same as what the value will be in the row if it is not NA – ANP Feb 13 '19 at 07:43
  • @ANP the link Cyrus posted should solve your question. – LAP Feb 13 '19 at 07:44
  • @Cyrus the link posted is helpful but my data is non-numeric. So how do i sum over the rows? – ANP Feb 13 '19 at 07:46
  • 1
    There is a solution at linked post that works with non-numeric: https://stackoverflow.com/a/28036595/680068 – zx8754 Feb 13 '19 at 08:07

1 Answers1

1
library(dplyr)
df %>% group_by(Invoice) %>% 
       summarise_all(funs(sub('NA,|,NA','',paste(.,collapse = ','))))

# A tibble: 2 x 5
  Invoice Pizza Pasta Soda  Cake 
    <int> <chr> <chr> <chr> <chr>
1       1 NA    pasta NA    cake 
2       2 pizza pasta NA    NA
A. Suliman
  • 12,923
  • 5
  • 24
  • 37