Comparing two columns in a dataframe using R

Question

I am trying to compare two columns in a dataframe to find rows where the two columns are not equal.

I would do:

df %>% filter(column1 != column2)

This will give me cases where values exist in both columns and are not equal (e.g. column1 = 5, column2 = 6)

However it will not give me cases where one of the values is NA (e.g. column1 = NA, column2 = 7)

How can I include the latter case into the filter function?

Thanks

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Oct 27 '21 at 03:46

U13-Forward · Answer 1 · 2021-10-27T06:02:41.033

3

Or use xor:

df %>% filter(a != b | xor(is.na(a), is.na(b)))

Or as @thelatemail mentioned, you could use Base R:

df[which(df$a != df$b | xor(is.na(df$a), is.na(df$b))),]

Or as @runr mentioned, you could try subset in Base R:

subset(df, a != b | xor(is.na(a), is.na(b)))

edited Oct 27 '21 at 06:02

answered Oct 27 '21 at 03:35

U13-Forward

69,221
14
89
114

I think this code will include (NA, NA), which I do not want in the final df since they are 'equal' for my purposes – MLE Oct 27 '21 at 03:42
@MLE Check it out now, `xor` – U13-Forward Oct 27 '21 at 03:46
@MLE `xor` is `or`, but not `and`. So it selects if one out of the two is `NA`, but won't select if both of them are – U13-Forward Oct 27 '21 at 03:47
1

Nice logic. Interesting that this selection only works in `dplyr::filter` because `NA` selections are dropped. Got to add a `which()` for `[]` selection - `df[which(df$a != df$b | xor(is.na(df$a), is.na(df$b))),]` – thelatemail Oct 27 '21 at 04:45
@thelatemail Yeap! Thanks for a solution in Base R, added that! – U13-Forward Oct 27 '21 at 05:12
1

Also just ``subset(df, a != b | xor(is.na(a), is.na(b)))`` instead of ``with()`` does the job with base R – runr Oct 27 '21 at 06:01

Ronak Shah · Answer 2 · 2021-10-27T03:43:20.360

2

You can include them with an OR (|) condition -

library(dplyr)

df <- data.frame(a = c(1, 2, NA, 4, 5), b = c(NA, 2, 3, 4, 8))

df %>% filter(a != b | is.na(a) | is.na(b))

#   a  b
#1  1 NA
#2 NA  3
#3  5  8

Another option would be to change NA values to string "NA" and then only using a != b should work.

df %>% 
  mutate(across(.fns = ~replace(., is.na(.), 'NA'))) %>%
  filter(a != b) %>%
  type.convert(as.is = TRUE)

edited Oct 27 '21 at 03:43

answered Oct 27 '21 at 03:30

Ronak Shah

377,200
20
156
213

With this, wouldn't (NA, NA) be included? (which I don't want since they are 'equal' for my purposes) Given that NA == NA is NA not TRUE. Just looking for a clean solution – MLE Oct 27 '21 at 03:39
1

@MLE See the updated answer. An example dataframe with a sample output would have helped to better understand the problem. – Ronak Shah Oct 27 '21 at 03:45

score 0 · Answer 3 · answered Oct 27 '21 at 16:55

0

We can use if_any

library(dplyr)
df %>% 
   filter(a != b | if_any(everything(), is.na))
   a  b
1  1 NA
2 NA  3
3  5  8

data

df <- structure(list(a = c(1, 2, NA, 4, 5), b = c(NA, 2, 3, 4, 8)),
 class = "data.frame", row.names = c(NA, 
-5L))

answered Oct 27 '21 at 16:55

akrun

874,273
37
540
662

Comparing two columns in a dataframe using R

3 Answers3

data