0

I have this data frame

df<-data.frame(ID=c(1,1,2,2,2),A=c(1,2,1,2,3),B=c("A","T","T","A","G"))

  ID A B
1  1 1 A
2  1 2 T
3  2 1 T
4  2 2 A
5  2 3 G

and I need this summarize table

summary_df <- data.frame(ID = c(1,2), sort_factor_and_combin_B = c("A-T","A-T-G"))

  ID sort_factor_and_combin_B
1  1                      A-T
2  2                    A-T-G

Regardless of the order of column A, I want to create a column that contains the characters that are concatenated in alphabetical order with the factors in column B that each ID has.

2. At the same time, I also want a column that joins according to the order of A.

do you have any idea?

thank you!

h-y-jp
  • 199
  • 1
  • 8

2 Answers2

3

We can use tapply()

tmp1 <- tapply(df$B, df$ID, function(x){
  paste(sort(x), collapse = "-")
})

# cbind to desired format
cbind("ID" = unique(df$ID),
"sort_factor_and_combin_B" = tmp1)

#   ID  sort_factor_and_combin_B
# 1 "1" "A-T"                   
# 2 "2" "A-G-T"  
fabla
  • 1,806
  • 1
  • 8
  • 20
1

For each ID sort B values and paste them together.

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(sort_factor_and_combin_B = paste0(sort(B), collapse = '-'))

#     ID sort_factor_and_combin_B
#* <dbl> <chr>                   
#1     1 A-T                     
#2     2 A-G-T                 

Base R aggregate :

aggregate(B~ID, df, function(x) paste0(sort(x), collapse = '-'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213