3

Let's say I have the following df:

mydf <- data.frame(col1 = c("Red", "Red", "Blue", "Orange"),
                   col2 = c("Red", "Blue", NA, "Red"),
                   col3 = c("Red", "Red", "Blue", "Red"),
                   col4 = c("Red", "Red", "Blue", "Blue"))

I'd like to create a column called "all_equal" that is set to 1 if all non_NA values across columns 1-4 are equal to the same value. This should result in the following:

    col1 col2 col3 col4 all_true
1    Red  Red  Red  Red     TRUE
2    Red Blue  Red  Red    FALSE
3   Blue <NA> Blue Blue     TRUE
4 Orange  Red  Red Blue    FALSE

Note that the NA in column two should not count against equality. I've tried using all to test for equality, but it seems to not work well in dplyr chains.

markus
  • 25,843
  • 5
  • 39
  • 58
Parseltongue
  • 11,157
  • 30
  • 95
  • 160

4 Answers4

2

One dplyr and purrr solution could be:

mydf %>%
 mutate(all_equal = map_dbl(.x = transpose(select(., everything())), 
                            ~ n_distinct(na.omit(.x))) == 1)

  col1 col2 col3   col4 all_equal
1  Red Blue  Red    Red     FALSE
2  Red Blue  Red    Red     FALSE
3 Blue <NA>  Red Orange     FALSE
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

You can use c_across() with rowwise().

library(dplyr)

mydf %>%
  rowwise() %>%
  mutate(all_true = n_distinct(c_across(col1:col4), na.rm = T) == 1) %>%
  ungroup()

# # A tibble: 4 x 5
#   col1   col2  col3  col4  all_true
#   <chr>  <chr> <chr> <chr> <lgl>   
# 1 Red    Red   Red   Red   TRUE    
# 2 Red    Blue  Red   Red   FALSE   
# 3 Blue   NA    Blue  Blue  TRUE    
# 4 Orange Red   Red   Blue  FALSE  
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
1

In base R you can do

mydf$all_equal <- ifelse(apply(mydf, 1, function(x) length(unique(na.omit(x)))) == 1, TRUE, FALSE)

Output

#     col1 col2 col3 col4 all_equal
# 1    Red  Red  Red  Red      TRUE
# 2    Red Blue  Red  Red     FALSE
# 3   Blue <NA> Blue Blue      TRUE
# 4 Orange  Red  Red Blue     FALSE
Ric S
  • 9,073
  • 3
  • 25
  • 51
0

Adapting this previous answer here

mydf['all_true'] <- (rowSums(mydf == mydf[,1], na.rm=TRUE) + rowSums(is.na(mydf))) == ncol(mydf)
CPak
  • 13,260
  • 3
  • 30
  • 48