Test for equality across subset of columns in dplyr

Question

Let's say I have the following df:

mydf <- data.frame(col1 = c("Red", "Red", "Blue", "Orange"),
                   col2 = c("Red", "Blue", NA, "Red"),
                   col3 = c("Red", "Red", "Blue", "Red"),
                   col4 = c("Red", "Red", "Blue", "Blue"))

I'd like to create a column called "all_equal" that is set to 1 if all non_NA values across columns 1-4 are equal to the same value. This should result in the following:

    col1 col2 col3 col4 all_true
1    Red  Red  Red  Red     TRUE
2    Red Blue  Red  Red    FALSE
3   Blue <NA> Blue Blue     TRUE
4 Orange  Red  Red Blue    FALSE

Note that the NA in column two should not count against equality. I've tried using all to test for equality, but it seems to not work well in dplyr chains.

score 2 · Answer 1 · answered Jun 09 '20 at 15:14

One dplyr and purrr solution could be:

mydf %>%
 mutate(all_equal = map_dbl(.x = transpose(select(., everything())), 
                            ~ n_distinct(na.omit(.x))) == 1)

  col1 col2 col3   col4 all_equal
1  Red Blue  Red    Red     FALSE
2  Red Blue  Red    Red     FALSE
3 Blue <NA>  Red Orange     FALSE

score 2 · Accepted Answer · answered Jun 09 '20 at 16:10

2

You can use c_across() with rowwise().

library(dplyr)

mydf %>%
  rowwise() %>%
  mutate(all_true = n_distinct(c_across(col1:col4), na.rm = T) == 1) %>%
  ungroup()

# # A tibble: 4 x 5
#   col1   col2  col3  col4  all_true
#   <chr>  <chr> <chr> <chr> <lgl>   
# 1 Red    Red   Red   Red   TRUE    
# 2 Red    Blue  Red   Red   FALSE   
# 3 Blue   NA    Blue  Blue  TRUE    
# 4 Orange Red   Red   Blue  FALSE

answered Jun 09 '20 at 16:10

Darren Tsai

32,117
5
21
51

Nice! The cleanest solution, yet! – Parseltongue Jun 09 '20 at 16:31

score 1 · Answer 3 · answered Jun 09 '20 at 15:24

In base R you can do

mydf$all_equal <- ifelse(apply(mydf, 1, function(x) length(unique(na.omit(x)))) == 1, TRUE, FALSE)

Output

#     col1 col2 col3 col4 all_equal
# 1    Red  Red  Red  Red      TRUE
# 2    Red Blue  Red  Red     FALSE
# 3   Blue <NA> Blue Blue      TRUE
# 4 Orange  Red  Red Blue     FALSE

score 0 · Answer 4 · answered Jun 09 '20 at 15:30

0

Adapting this previous answer here

mydf['all_true'] <- (rowSums(mydf == mydf[,1], na.rm=TRUE) + rowSums(is.na(mydf))) == ncol(mydf)

answered Jun 09 '20 at 15:30

CPak

13,260
3
30
48

Test for equality across subset of columns in dplyr

4 Answers4