1

I am looking for a most frequent combination of values in my data frame, however, I am not after unique values only. Here is my table:

ID <- 1:5
A = c("a", "a", "", "", "a")
B = c("b", "b", "", "", "")
C = c("c", "c", "c", "c", "c")
D = c("d", "d", "", "", "d")

df <- data.frame(ID, A, B, C, D, stringsAsFactors=FALSE)

and here it what we get:

> df
  ID A B C D
1  1 a b c d
2  2 a b c d
3  3     c  
4  4     c  
5  5 a   c d

What I am after is this:

Comb    Freq
a,b,c,d   2
a,c,d     3
a,c       3
a,d       3
c,d       3
b,c       2

Any suggestions?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

1 Answers1

0

Try this

table(apply(df[,-1],1, paste ,sep=",", collapse=""))
Bing
  • 1,083
  • 1
  • 10
  • 20
  • Thank you, this works to some extent, however, it only counts the occurrence of the whole sequence in the data frame rows, it ignores for example partially matched strings like in my example "a,c", "a,d" etc. I am aware of other solutions but they do not answer my question completely. – Wojciech Bednarz Oct 18 '18 at 09:57