0

I have a dataframe with 3 columns: A, B, C In each column there is values going from 0 to 10 I want to figure out how often each possible size orders occurs e.g.:

A>B>C #?
A>C>B #?
B>A>C #?
B>C>A #?
C>A>B #?
C>B>A #?

When dealing with only two columns this is rather easy and can be done as follows:

df$c <- as.numeric(df$a > df$b)

I was wondering whether there is something simlar for 3 or more columns (i.e. a formula that asigns n! values; in the case of n being 3: a 1,2,3,4,5,6 value based on the order observed)

Jeroen
  • 79
  • 1
  • 9
  • 3
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 04 '20 at 19:32
  • Do you need a combination `set.seed(24) ; df <- as.data.frame(matrix(sample(0:10, 3 * 50, replace = TRUE), 50, 3, dimnames = list(NULL, LETTERS[1:3])));combn(df, 2, function(x) sum(x[1] > x[2]))` – akrun Feb 04 '20 at 19:33

2 Answers2

1

Using the sample data from akrun

I made a simple function that returns the combination of letters in the right order and then a simple table

set.seed(24)    
df <- as.data.frame(
  matrix(
    sample(0:10, 3 * 50, replace = TRUE), 
    50, 3, dimnames = list(NULL, LETTERS[1:3])))

order_ABC <- function(a, b, c) {
  ls <- c(a, b, c)
  names(ls) <- c("a", "b", "c")
  paste0(names(sort(ls)), collapse = "")
}

df$order <- mapply(order_ABC, df$A, df$B, df$C)

table(df$order)

> abc acb bac bca cab cba 
>  7  15   8   8   5   7 
Gallarus
  • 476
  • 3
  • 9
1

It depends on how you want to handle ties, but this seems like a good use case for matrixStats::rowRanks: Here, I inverted the ranks, so 1 designates the highest number in the row.

set.seed(24)    
df <- as.data.frame(
    matrix(
        sample(0:10, 3 * 50, replace = TRUE), 
        50, 3, dimnames = list(NULL, LETTERS[1:3])))
table(apply(
    matrixStats::rowRanks(-1* as.matrix(df), ties.method = "min"), 
    1, paste, collapse= ""))
#> 
#> 113 122 123 131 132 212 213 221 231 312 321 
#>   2   3   5   4   5   2   8   1   3  10   7

The labels are showing the negative ranks (from largest to smallest) for the columns A, B and C (thus 1 > 2 > 3) in that same order. 122 means A (ranked first) > B = C (both ranked second), 312 means B > C > A, etc..

As you can see in this example, A>B>C (123) occurs only 5 times, A=B>C (113) 2 times and A>B=C (122) 3 times.

user12728748
  • 8,106
  • 2
  • 9
  • 14