3

I have a dataframe (my_dataframe) with 5 columns. All have 0 or 1 values. I would like to create a new column called cn7_any, which should have values of 1 when any values from columns 2:5 are ==1.

structure(list(cn7_normal = c(1L, 1L, 1L, 1L, 1L, 1L), 
    cn7_right_paralysis_central = c(0L, 0L, 0L, 0L, 0L, 0L), 
    cn7_right_paralysis_peripheral = c(0L, 0L, 0L, 0L, 0L, 0L), 
    cn7_left_paralysis_central = c(0L, 0L, 0L, 0L, 0L, 0L), 
    cn7_left_paralysis_peripheral = c(0L, 0L, 0L, 0L, 0L, 0L)), 
    row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
> head(my_dataframe)
# A tibble: 6 x 5
  cn7_normal cn7_right_paralysis_cen… cn7_right_paralysis_perip… cn7_left_paralysis_cen… cn7_left_paralysis_peri…
       <int>                    <int>                      <int>                   <int>                    <int>
1          1                        0                          0                       0                        0
2          1                        0                          0                       0                        0

I could do it successfully with case_when():

my_dataframe<-my_dataframe%>%
        mutate(cn7_paralisis_any=case_when(cn7_right_paralysis_central==1 ~ 1,
                                           cn7_right_paralysis_peripheral==1 ~ 1,
                                           cn7_left_paralysis_central==1 ~ 1,
                                           cn7_left_paralysis_peripheral==1 ~ 1,
                                           TRUE ~ 0)
                )

Although it worked, I wonder whether there is a simpler, less verbose solution. I feel I should be using any() somehow. Any ideas?

M.Viking
  • 5,067
  • 4
  • 17
  • 33
GuedesBF
  • 8,409
  • 5
  • 19
  • 37

4 Answers4

2
my_dataframe$cn7_any <- apply(my_dataframe[ , 2:5], 1, max)
SteveM
  • 2,226
  • 3
  • 12
  • 16
1

Your data is all zeroes, so I'll change a couple to prove the point.

rowSums(my_dataframe[,2:5]) > 0
# [1] FALSE  TRUE FALSE  TRUE FALSE FALSE
+(rowSums(my_dataframe[,2:5]) > 0)
# [1] 0 1 0 1 0 0

my_dataframe$cn7_any <- +(rowSums(my_dataframe[,2:5]) > 0)

Within dplyr,

my_dataframe %>%
  mutate(cn7_any = rowSums(across(-cn7_normal, ~ . > 0)) > 0)
# # A tibble: 6 x 6
#   cn7_normal cn7_right_paralysis_central cn7_right_paralysis_peripheral cn7_left_paralysis_central cn7_left_paralysis_peripheral cn7_any
#        <int>                       <int>                          <int>                      <int>                         <int> <lgl>  
# 1          1                           0                              0                          0                             0 FALSE  
# 2          1                           0                              0                          0                             1 TRUE   
# 3          1                           0                              0                          0                             0 FALSE  
# 4          1                           0                              0                          1                             0 TRUE   
# 5          1                           0                              0                          0                             0 FALSE  
# 6          1                           0                              0                          0                             0 FALSE  

It seems like a logical thing you're doing, not a number thing, but if you want numbers, just use the +(.) trick as above:

my_dataframe %>%
  mutate(cn7_any = +(rowSums(across(-cn7_normal, ~ . > 0)) > 0))
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

Similar to Using any() vs | in dplyr::mutate

I also changed a few digits in your dataset.

V2: Using or |

V3: Using the dplyr::rowwise() prior to mutate to effectively group input by rows, then use the all() function (all looks at the entire vector, which is why you get the unexpected result)

my_dataframe<-structure(list(cn7_normal = c(1L, 1L, 1L, 1L, 1L, 1L), 
    cn7_right_paralysis_central         = c(0L, 0L, 0L, 0L, 0L, 0L), 
    cn7_right_paralysis_peripheral      = c(1L, 0L, 0L, 0L, 0L, 0L), 
    cn7_left_paralysis_central          = c(0L, 1L, 0L, 0L, 0L, 0L), 
    cn7_left_paralysis_peripheral       = c(0L, 0L, 0L, 0L, 0L, 0L)), 
    row.names = c(NA, -6L), 
    class = c("tbl_df", "tbl", "data.frame"))

my_dataframe%>%
  rowwise() %>% ### rowwise ###
  mutate(cn7_paralisis_any=case_when(cn7_right_paralysis_central==1 ~ 1,
                                     cn7_right_paralysis_peripheral==1 ~ 1,
                                     cn7_left_paralysis_central==1 ~ 1,
                                     cn7_left_paralysis_peripheral==1 ~ 1,
                                     TRUE ~ 0),
         cn7_v2=(cn7_right_paralysis_central|cn7_right_paralysis_peripheral|cn7_left_paralysis_central|cn7_left_paralysis_peripheral),
         cn7_v3=any(cn7_right_paralysis_central ,cn7_right_paralysis_peripheral, cn7_left_paralysis_central, cn7_left_paralysis_peripheral)
  ) %>% 
  select(cn7_paralisis_any,cn7_v2,cn7_v3)


# A tibble: 6 x 3
# Rowwise: 
#  cn7_paralisis_any cn7_v2 cn7_v3
#              <dbl> <lgl>  <lgl> 
#1                 1 TRUE   TRUE  
#2                 1 TRUE   TRUE  
#3                 0 FALSE  FALSE 
#4                 0 FALSE  FALSE 
#5                 0 FALSE  FALSE 
#6                 0 FALSE  FALSE 
M.Viking
  • 5,067
  • 4
  • 17
  • 33
0

I now use dplyr::if_any and dplyr::if_all in such cases. I think it makes the code very clear and readable whenever we must perform such rowwise logical operations in dplyr.

For this particular case, I would now use:

library(dplyr)

my_dataframe %>%
     mutate(cn7_paralisis_any = +if_any(across(-cn7_normal)))
GuedesBF
  • 8,409
  • 5
  • 19
  • 37