2

I need to compare two dataframes that explain the same things, but they came from different ways to obtain them.

So I need to obtain a df where every single value is compared to the respective on the other df, and give to me TRUE if values are identical, FALSE if they aren't.

I write an example just for better explain:

df1

>    1  2  3  
> 1 AT GC CC 
> 2 AG GC CT 
> 3 GG TT <NA>

df2

>    1  2   3  
> 1 AT <NA> GG 
> 2 AG  GC  CG 
> 3 GG  TT  AA

result

>      1     2     3  
> 1 TRUE <NA>  FALSE 
> 2 TRUE TRUE  FALSE 
> 3 TRUE TRUE  <NA>

I've seen here a result

Comparing two similar dataframes and finding different values between them

but in my df doesn't work if one of the df has an NA (R gave me TRUE).

Also, I aspected that if I change the order of the df in mapply(), I will obtain the same result, but it's not true in my case. The dataframes also have different levels, so df1==df2 doesn't work.

I also will ask to you how I will count the FALSE in the result. Is there something like is.na()?

thank you all

Community
  • 1
  • 1
mppd
  • 57
  • 1
  • 1
  • 9

2 Answers2

2

We can just use == to get a logical matrix

(df1 == df2) & !is.na(df1) & !is.na(df2)
#    1     2     3
#1 TRUE FALSE FALSE
#2 TRUE  TRUE FALSE
#3 TRUE  TRUE FALSE

If the columns are factor class, then we can compare colwise with mapply/Map

mapply(function(x, y) {i1 <- as.character(x)==as.character(y)
         replace(i1, is.na(i1), FALSE)}, df1, df2)

Or compare as matrix and then convert the NA to FALSE

m1 <- as.matrix(df1) == as.matrix(df2)
m1[is.na(m1)] <- FALSE
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I've got this error: `Error in Ops.factor(left, right) : level sets of factors are different` – mppd May 02 '17 at 11:57
  • @mppd It is better to compare `character` class columns, `i.e. `df1[] <-lapply(df1, as.character); df2[] <- lapply(df2, as.character)` and then do the comparison – akrun May 02 '17 at 11:59
  • I've seen that if in one of the df I've NA, in the end I've NA, but also if it's a false statement. How can I maintain a NA if in df1 is NA and in df2 not and vice-versa, and also obtain a FALSE statement if AT in df 1 is different then AA in df2, for example? – mppd May 04 '17 at 13:19
  • @mppd In that case, you just remove one of the `is.na` i.e. `(df1 == df2) & !is.na(df2)` – akrun May 04 '17 at 13:33
  • Last question and I'm done! (: what if there's `` in df1 and `` in df2 and I want to see `` too at the end? – mppd May 05 '17 at 08:33
  • @mppd In that case, just do `(df1 == df2)` – akrun May 05 '17 at 08:35
  • 1
    Very very very helpful! You made my day (and my job). Thank you! – mppd May 05 '17 at 08:49
1

Another possible option,

df1 == replace(df2, is.na(df2), 'NA')

or If both data frames contain NAs,

replace(df1, is.na(df1), 'NA') == replace(df2, is.na(df2), 'NA')
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • It doesn't work. It corrects the mistakes with na, but gives a TRUE value when it doesn't is. – mppd May 02 '17 at 12:08
  • I don't get it. So if you have two values 'NA' and 'NA', you want it to display FALSE? – Sotos May 02 '17 at 12:10
  • no, if in df1 there's AG and in df2 there's NA, I see FALSE and it's good, but with your code, if I have in df1 AG and in df2 TT, I will see TRUE and it's not what I need. I hope it's better explained now – mppd May 02 '17 at 12:13
  • @mppd I get FALSE for that. I get the same result you are after – Sotos May 02 '17 at 12:19
  • I've setting before `df1[] <-lapply(df1, as.character); df2[] <- lapply(df2, as.character)` because levels are differents. But at the end I obtain a table that's all FALSE. – mppd May 04 '17 at 13:15
  • At the end I've done this: `x <- df1 == replace(df2, is.na(df2), 'NA') y <- df2 == replace(df1, is.na(df1), 'NA')` z <- x == y – mppd May 04 '17 at 13:29