1

I have a data frame:

df <- data.frame(sample_names=c("foo","bar","foo, bar"), sample_values=c(1,5,3))
df

  sample_names sample_values
1          foo             1
2          bar             5
3     foo, bar             3

and I want a resulting data.frame of the following shape:

  sample_names sample_values
1          foo             4
2          bar             8

Is there a elegant way to achieve this? My workaround would be to grep by "," and somehow fidly add the result to the existing rows. Since I want to apply this on multiple dataframes, I'd like to come up with an easier solution. Any ideas for this?

Phil
  • 7,287
  • 3
  • 36
  • 66
tacoman
  • 882
  • 6
  • 10

1 Answers1

2

We can use separate_rows to split the column, then do a group by operation to get the sum

library(dplyr)
library(tidyr)
df %>%
   separate_rows(sample_names) %>% 
   group_by(sample_names) %>% 
   summarise(sample_values = sum(sample_values), .groups = 'drop')

-output

# A tibble: 2 x 2
#  sample_names sample_values
#  <chr>                <dbl>
#1 bar                      8
#2 foo                      4

Or with base R by splitting the column with strsplit into a list of vectors, then use tapply to do a group by sum

lst1 <- strsplit(df$sample_names, ",\\s+")
tapply(rep(df$sample_values, lengths(lst1)), unlist(lst1), FUN = sum)
akrun
  • 874,273
  • 37
  • 540
  • 662