-2

Given the following sample, I'd like to iterate across each column. My result for the 3x3 matrix will provide a ratio based on how many elements in each column match.

m <- data.frame("c1" = c(1,0,0), "c2" = c(1,1,0), "c3" = c(0,0,1))

Example code to check columns:
> m[,1] == m
> m[,2] == m
> m[,3] == m

Output for m[,1] == m

    c1 c2 c3
[1,] T, T, F
[2,] T, F, F
[3,] T, T, F

I'd like to sum up all of the results respectively.

example: 
m[,1] 1 + 2/3 + 1/3 

I think a nested loop could solve this problem.

swbro123
  • 3
  • 2
  • What is `c1`? Your matrix has `d1` (...) and `col1` (...). – r2evans Jul 25 '19 at 19:41
  • updated to reflect correct column headers... – swbro123 Jul 25 '19 at 20:21
  • Is this "by row"? What is your expected output for the above matrix? "5/6" doesn't make sense to me, as each row only has 3 unique comparisons, and given binary data as above, you'll always have one of 1 or 1/3. – r2evans Jul 25 '19 at 20:26
  • *Please provide your expect output for each row in your sample data.* (You cannot get 2/3 in this example, since by its transitive nature if col1=col2 and col2=col3 then always col1=col3.) – r2evans Jul 25 '19 at 22:42
  • I'm comparing each column to the other. So compare col1 to col2 and col2 to col1. Now compare col1 to col3 and col3 to col1. And finally compare col2 to col3 and col3 to col2. If the values in the first col are found in the matching column it should return a match. In this case, col1==col3 and col2==col3. I'll add more data to help state this idea more clearly. – swbro123 Jul 25 '19 at 22:47
  • In your example, if you are comparing columns then no two columns are the same ... 0. If you are doing this by row, then are you expecting a vector of: `c(3,1,3,1,1)/3`? If not, then I'm really confused (because I'm still thinking there should be a value per row of this sample data). – r2evans Jul 25 '19 at 22:54
  • I'm trying to compare commonality amongst columns. It won't be an exact fit. So for example, both col1 and col2 "share commonality" with col 3 because column 3 contains a 1 in the same elements as those two columns. However, that same commonality can't be said about col1 to col2 or even col3 to col1/col2. Does that help explain what I'm trying to do I hope? – swbro123 Jul 26 '19 at 00:16
  • Does my answer work? You shouldn't need a `for` loop for this, I think. – r2evans Jul 26 '19 at 04:29
  • `m[,1] == m[,1:3]` is not what you show, your data and code and output are inconsistent. Are you dealing with matrices or with frames? Not that it makes a huge difference, but consistency is helpful. – r2evans Jul 26 '19 at 19:25
  • Updated code. Listen, if you can't figure it out just stop lol "is it a matrix or df?" – swbro123 Jul 26 '19 at 23:28
  • I hear you, and while it may seem like I'm splitting hairs, methods that work well on a `matrix` do not always work as well (or at all) on frames. And since your edited code *still* is different (`m[,1] == m` produces different results for me), your question has generally been confusing, both in the logic you are trying to implement as well as the still-unstated expected output for your sample data. Want faster answers? *Please* read about how to ask a question *well*, it really makes a difference for both of us: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve. – r2evans Jul 26 '19 at 23:54
  • Sounds good, this has been a learning process for me. The requirements constantly change based on logical results. I'll take a look at the link. – swbro123 Jul 27 '19 at 14:54

1 Answers1

0

Edit, guessing at the expected answer:

class(m)
# [1] "data.frame"
m
#   c1 c2 c3
# 1  1  1  0
# 2  0  1  0
# 3  0  0  1

m[,1] == m # different than you have in your question
#        c1    c2    c3
# [1,] TRUE  TRUE FALSE
# [2,] TRUE FALSE  TRUE
# [3,] TRUE  TRUE FALSE

apply(m, 2, function(a) sum(colSums(a == m) / length(a)))
#       c1       c2       c3 
# 2.000000 1.666667 1.333333 

or if you don't want "0" to count:

apply(m, 2, function(a) sum(colSums(a == m & a > 0) / length(a)))
#        c1        c2        c3 
# 0.6666667 1.0000000 0.3333333 

Previous answer:

m <- matrix(c(1,0,0,1,0,1,1,0,0,1,1,1,0,1,1), nc=3)
m
#      [,1] [,2] [,3]
# [1,]    1    1    1
# [2,]    0    1    1
# [3,]    0    0    0
# [4,]    1    0    1
# [5,]    0    1    1
apply(m, 1, function(x) sum(sapply(table(x), choose, 2))) / ncol(m)
# [1] 1.0000000 0.3333333 1.0000000 0.3333333 0.3333333
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • This function is perfectly fine for averaging rows, but I want to compare columns. – swbro123 Jul 26 '19 at 12:28
  • 1
    swbro123, your expected output of "2/6" (for both the 3x3 matrix and the interim 5x3 matrix you had at one point) does not make sense to me in any form. Your pseudocode of `col1 == col3` is neither meta-accurate (it will return a vector of length `nrow(mtx)`) nor ever true (no two columns have ever been identical), so I have no idea how any sample data you've provided can reduce to a single non-zero "commonality". So ***please***, clear up your logic (to explain how `col1==col3`) or something else ... b/c this is not working. – r2evans Jul 26 '19 at 16:58
  • 1
    OK, I understand your concern. I made adjustments to expose more of my logic . – swbro123 Jul 26 '19 at 17:31
  • After seeing your second answer that's exactly what is needed. However, now I see where the problem lies in my expected output versus R output. The columns are returning True when both rows contain a matching zero. I can't have that type of return. I think we'll have to change the elements to T, F and only count true to get the expected output. Also, can I simply subtract the output by 1? Because c1 = 1. c2 = 2/3. and c3 = 1/3. – swbro123 Jul 27 '19 at 15:06
  • See my edit, swbro123. "Spiral development" is hard to control and hard to deal with, too. – r2evans Jul 27 '19 at 15:45
  • I agree 100%, r2evans. Technically, your answer works for testing the first column. I will work on a way to iterate it across the rest of the problem set to include columns 2 and 3. – swbro123 Jul 27 '19 at 20:18
  • I don't understand (shocker :-), it's iterating over all three columns: it compares the first column with the whole matrix, then the second column with the whole matrix, etc. – r2evans Jul 27 '19 at 23:06