Create a data frame with matches and mismatches between two data frames

Question

I am trying to create a heatmap in order to visualize matches and mismatches between some predicted and expected values.

If a is the data frame containing the predicted values and b the expected ones;

a = rbind (sample(0:1, size=14, replace = T),sample(0:1, size=14, replace = T))
b = rbind (sample(0:1, size=40, replace = T),sample(0:1, size=40, replace = T))

How can I create a third data frame containing only the common columns of a & b and give back

a certain value when a value is the same in the two data frames
another value if the predicted value was 0 and the expected 1
another value if the predicted value was 1 and the expected 0.

The two data frames do not have the same number of columns, but all the columns of the smaller dataframe are part of the bigger one. — Rina, Nov 05 '20 at 13:35
I don't understand, sorry. Could you post what the expected output would be? The first two columns of `a` are both `c(1,0)` and occur in the second dataframe numerous times. — MKR, Nov 05 '20 at 13:41
The comparison would be performed among the corresponding columns and rows of the two data frames. For example, V1 in A versus V1 in B. What I would like is to compare A[1,V1] and B[1,V1]. If they have the same value, in a third data frame, let's say C, I would like to have a column called V1 and a pre-defined value that denotes that the values between A and B for this specific column match. Does that make sense? — Rina, Nov 05 '20 at 14:12

Gregor Thomas · Accepted Answer · 2020-11-05T14:29:43.507

0

## your example data are matrices, 
## let's make them data frames:
a = as.data.frame(a)
b = as.data.frame(b)

common_cols = intersect(names(a), names(b))

## see where they are equal
## TRUE means equal, FALSE means not equal
a[common_cols] == b[common_cols]
#         V1   V2    V3   V4    V5    V6   V7    V8    V9   V10   V11   V12   V13  V14
# [1,]  TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE TRUE
# [2,] FALSE TRUE  TRUE TRUE  TRUE FALSE TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE TRUE

## see the difference
## 0 means a and b are equal
## 1 means a is 1 and b is 0
## -1 means a is 0 and b is 1
a[common_cols] - b[common_cols]
#   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
# 1  0  0 -1  0  1 -1  0  1  0  -1   1   0  -1   0
# 2  1  0  0  0  0  1  0  0 -1   0  -1  -1   0   0

edited Nov 05 '20 at 14:29

answered Nov 05 '20 at 14:22

Gregor Thomas

136,190
20
167
294

That worked nicely! Thanks! However, I noticed that for a value that is the same between the two data frames I get a FALSE. However when I specifically check for equality by indexing the specific value in both data frames I get a TRUE... Any idea? – Rina Nov 06 '20 at 12:00
Three guesses (a) if your data is not `integer`, you could be hitting a floating point precision issue - [see this FAQ on the subject](https://stackoverflow.com/q/9508518/903061). (b) You have class issues - maybe comparing factors with different levels, or comparing a factor to a numeric. (c) You have a typo and aren't actually comparing the right rows/columns. If you share the actual data in question, something like `dput(your_real_a[relevant_row, relevant_column, drop = FALSE])` and similarly for `b`, I can take a look and do more than guess. – Gregor Thomas Nov 06 '20 at 14:05

Create a data frame with matches and mismatches between two data frames

1 Answers1