Count unique rows irrespective of column order

Question

EDIT
My question was badly asked. Therefore I re-edited it in order to make it hopefully more useful for others. It has an answer already.

sample data.frame:

set.seed(10) 
df <- data.frame(a = sample(1:3, 30, rep=T), b = sample(1:3, 30,  rep = T), c = sample(1:3, 30, rep = T))

My question:

I have several columns (in my example a,b,c). Now, slightly similar, but different to this question asked by R-user, I would like to count the possible 'value sets' of in this case three columns (but in general: n columns), irrespective of their order.

count(df,a,b,c) from dplyr does not help:

require (dplyr)
count(df,a,b,c)
    # A tibble: 17 x 4
           a     b     c     n
       <int> <int> <int> <int>
     1     1     1     1     1
     2     1     1     2     2
     ...
     7     2     1     1     4
     ...

In this example, row 2 and 7 contain the same set of values (1,1,2), and that's not what I want, because I do not care about the order of the values within the set, so '1,1,2' and '2,1,1' should be considered the same. How to count those value sets?

EDIT 2 The neat trick of @Mouad_S 's answer is that you first order the rows with apply() and then transpose the result (t()) and then you can use count on the columns.)

Please use `set.seed` so your example is reproducible, and also show desired output corresponding to the example. For guidance, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 — Frank, Aug 14 '17 at 14:38

Mouad_Seridi · Accepted Answer · 2018-01-15T01:05:17.363

0

require(dplyr)

set.seed(10) 
df <- data.frame(a = sample(1:3, 30, rep=T),
             b = sample(1:3, 30,  rep = T),
             c = sample(1:3, 30, rep = T))     


 ## the old answer 
 require(dplyr)
 count(data.frame(t(apply(df, 1, function(x) sort(x)))), X1, X2, X3)

## the new answer 
t(apply(df,1, function(x) sort(x))) %>%  # sorting the values of each row
as.data.frame() %>%  # turning the resulting matrix into a data frame
distinct() %>%  # taking the unique values 
nrow()   # counting them 

[1] 9

edited Jan 15 '18 at 01:05

answered Aug 14 '17 at 14:46

Mouad_Seridi

2,666
15
27

2

Why not simply `count(df, a, b, c)`? By using mutate you will have duplicate rows. – Axeman Aug 14 '17 at 14:54
that works too, and it's definitely more concise , my instinct was to leave the original data frame intact ( in case one needs the row number) . – Mouad_Seridi Aug 14 '17 at 14:58
Thanks for your quick answer! unfortunately, this solution does not give the result that I am looking for- the result is the same as when using group_by(df,a,b,c)- but here the order of a,b,c is important and each order counts as unique group- what I do not want. I realise my example df contained to random numbers to make the point clear- and worry to not have set.seed. I have corrected the df now. – tjebo Aug 14 '17 at 15:12
it is not clear what you are looking for, do you want the unique combinations irrespective of order ? – Mouad_Seridi Aug 14 '17 at 15:15
`count(data.frame(t(apply(df, 1, function(x) sort(x)))), X1, X2, X3)` – Mouad_Seridi Aug 14 '17 at 15:23
2

R doesn't have a count function. Use `library(whatever)` to show where it comes from. Also, most answers are better with some words in the answer. – Frank Aug 14 '17 at 15:31
Please see Edit above. I re-wrote this initially badly asked question. The answer works with three columns, but not with n columns. ... – tjebo Jan 15 '18 at 00:37
I edited the answer, it should work now with any number of columns. – Mouad_Seridi Jan 15 '18 at 01:06
Thanks very much @Mouad_S for editing your own answer so neatly. (Also I understood now a thought problem and only now understood why you are sorting rows and not columns. Thanks again. – tjebo Jan 15 '18 at 08:55

Count unique rows irrespective of column order

1 Answers1