0

I am trying to create a heatmap in order to visualize matches and mismatches between some predicted and expected values.

If a is the data frame containing the predicted values and b the expected ones;

a = rbind (sample(0:1, size=14, replace = T),sample(0:1, size=14, replace = T))
b = rbind (sample(0:1, size=40, replace = T),sample(0:1, size=40, replace = T))

How can I create a third data frame containing only the common columns of a & b and give back

  • a certain value when a value is the same in the two data frames
  • another value if the predicted value was 0 and the expected 1
  • another value if the predicted value was 1 and the expected 0.
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Rina
  • 149
  • 11
  • what do you mean by the "common columns"? – Vasily A Nov 05 '20 at 13:21
  • The two data frames do not have the same number of columns, but all the columns of the smaller dataframe are part of the bigger one. – Rina Nov 05 '20 at 13:35
  • I don't understand, sorry. Could you post what the expected output would be? The first two columns of `a` are both `c(1,0)` and occur in the second dataframe numerous times. – MKR Nov 05 '20 at 13:41
  • The comparison would be performed among the corresponding columns and rows of the two data frames. For example, V1 in A versus V1 in B. What I would like is to compare A[1,V1] and B[1,V1]. If they have the same value, in a third data frame, let's say C, I would like to have a column called V1 and a pre-defined value that denotes that the values between A and B for this specific column match. Does that make sense? – Rina Nov 05 '20 at 14:12

1 Answers1

0
## your example data are matrices, 
## let's make them data frames:
a = as.data.frame(a)
b = as.data.frame(b)

common_cols = intersect(names(a), names(b))

## see where they are equal
## TRUE means equal, FALSE means not equal
a[common_cols] == b[common_cols]
#         V1   V2    V3   V4    V5    V6   V7    V8    V9   V10   V11   V12   V13  V14
# [1,]  TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE TRUE
# [2,] FALSE TRUE  TRUE TRUE  TRUE FALSE TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE TRUE

## see the difference
## 0 means a and b are equal
## 1 means a is 1 and b is 0
## -1 means a is 0 and b is 1
a[common_cols] - b[common_cols]
#   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
# 1  0  0 -1  0  1 -1  0  1  0  -1   1   0  -1   0
# 2  1  0  0  0  0  1  0  0 -1   0  -1  -1   0   0
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • That worked nicely! Thanks! However, I noticed that for a value that is the same between the two data frames I get a FALSE. However when I specifically check for equality by indexing the specific value in both data frames I get a TRUE... Any idea? – Rina Nov 06 '20 at 12:00
  • Three guesses (a) if your data is not `integer`, you could be hitting a floating point precision issue - [see this FAQ on the subject](https://stackoverflow.com/q/9508518/903061). (b) You have class issues - maybe comparing factors with different levels, or comparing a factor to a numeric. (c) You have a typo and aren't actually comparing the right rows/columns. If you share the actual data in question, something like `dput(your_real_a[relevant_row, relevant_column, drop = FALSE])` and similarly for `b`, I can take a look and do more than guess. – Gregor Thomas Nov 06 '20 at 14:05