0

I have a dataframe mydf. I want to get the count for each item in combination column to get the result as shown below.

   mydf <-structure(c("AMLM12001KP", "AMLM120XP", "AMLM12001KP", "1231401", 
            "1231401", "1231401", "ANKRD30BL*", "WDR70*NXPH1", "WDR70*NXPH1", 
            "FGGY*", "LIN28A*DFNB59", "AK2*"), .Dim = c(6L, 2L), .Dimnames = list(
                NULL, c("customer_sample_id", "combination")))

result

combination      frequency    customer_sample_id
ANKRD30BL*       1 sample     AMLM12001KP 
WDR70*NXPH1      2 sample     AMLM120XP, AMLM12001KP
FGGY*            1 sample     1231401
LIN28A*DFNB59    1 sample     1231401
AK2*             1 sample     1231401 
MAPK
  • 5,635
  • 4
  • 37
  • 88

1 Answers1

1

With base R:

aggregate(customer_sample_id ~ combination, data = mydf,
          FUN = function(x) c(length(x), paste(x, collapse = ",")))

or with :

library(data.table)
mydt <- as.data.table(mydf)
mydt[, .(freq = .N, customer_sample_id = paste(customer_sample_id, collapse = ",")), by = combination]

or with :

library(dplyr)
data.frame(mydf) %>% 
  group_by(combination) %>% 
  summarise(freq = n(), customer_sample_id = paste(customer_sample_id, collapse = ","))
Jaap
  • 81,064
  • 34
  • 182
  • 193