0

Suppose that I have a data frame where each column is a method and each row is a metric of such method (the lower the better).

+----------+----------+
| Method 1 | Method 2 |
+----------+----------+
|        1 |        2 |
|        2 |        3 |
+----------+----------+

I would like to obtain a data frame with the count of wins and losses between all methods (potentially more than just two), a method wins if the it has a smaller metric than the other one. like this:

+----------+-----------+-----------+-----------+-----------+
|          | Method 1+ | Method 1- | Method 2+ | Method 2- |
+----------+-----------+-----------+-----------+-----------+
| Method 1 |         - |         - |         0 |         2 |
| Method 2 |         2 |         0 |         - |         - |
+----------+-----------+-----------+-----------+-----------+

Where the "+" in the method name indicates that the method wins or "-" when it lost.

The trivial way would be to iterate over each row of the data frame and make the comparison among all pair of columns but is quite inefficient.

Is there a more elegant solution in R?

Natanael Ramos
  • 340
  • 3
  • 10
  • Including a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your question will increase your chances of getting an answer. – Samuel Oct 19 '17 at 23:10
  • You also want might want to outline how your win/loss logic works, or how you get from your input to your desired result. – Jake Kaupp Oct 19 '17 at 23:25

1 Answers1

2

You actually don't need that many data points in this matrix to keep all the same information; the Method 2 row of Method 1+ (Method 1 beats Method 2 x number of times) will always be equal to the Method 1 row of Method 2- (Method 2 loses to Method 1 x number of times). So, we can get this information like so:

# First we make a function to count the wins in two columns
# (this will be useful later to feed to apply)
count_wins <- function(columns, data) {
    return(sum(data[ , columns[1]] < data[ , columns[2]]))
}
# Then we set the seed for some reproducible data
set.seed(123)
# Create some random example data
df <- data.frame(method1=sample(1:10, 5, replace=TRUE),
                      method2=sample(1:10, 5, replace=TRUE),
                      method3=sample(1:10, 5, replace=TRUE))
#   method1 method2 method3
# 1       3       1      10
# 2       8       6       5
# 3       5       9       7
# 4       9       6       6
# 5      10       5       2
# We make an empty matrix to store results
result <- matrix(NA, nrow=ncol(df), ncol=ncol(df))
# Create a matrix of all column pairings
combos <- combn(x=ncol(df), m=2)
# And use apply, upper/lower.tri, and count_wins to fill the matrix
result[upper.tri(result)] <- apply(combos, 2, count_wins, df)
result[lower.tri(result)] <- apply(combos[2:1,], 2, count_wins, df)
# Then we just name the rows and columns
rownames(result) <- colnames(result) <- paste0('method', 1:3)
#         method1 method2 method3
# method1      NA       1       2
# method2       4      NA       1
# method3       3       3      NA

This gives us a matrix where each row tells us how many times the row method beat the column method. For example, here method1 beats method2 once and method3 twice, while method2 beats method1 four times and method3 once, etc.

I don't know if this is the "elegant" solution you're looking for, but it should work faster than loops and gives you a smaller results matrix with all the same information.

duckmayr
  • 16,303
  • 3
  • 35
  • 53