1

I need to make special rules for some data that I have (if a value is <= 0.1 then make it missing it's an error) but I only want to do it for certain categories.

My data looks like this

   Category              value
     A                     0.9
     A                     0.001
     A                     0.3
     B                     0.01
     B                     0.8
     C                     0.01
     C                     0.01
     C                     0.2
     C                     NA

I want this

   Category              value
     A                     0.9
     A                     0.001
     A                     0.3
     B                     NA
     B                     0.8
     C                     NA
     C                     NA
     C                     0.2
     C                     NA

My code looks like this:

 want<- Mydata %>% 
           mutate(value2= if_else(!is.na(value) &
                                   value<=0.1 & 
                                   Category=='B' ||
                                   !is.na(value) &
                                   value<=0.1 & 
                                   Category=='C',
                                 as.numeric(NA), value ) )

But I get this error message:

 Error: `true` must be length 1 (length of `condition`), not 1245

My understanding is that || is a logical and & is an individual so essentially I want to say

IF (NOT NA AND <=15 AND in category B) OR (NOT NA AND <=15 AND in category C) then make the value NA else use the original value.

I don't understand why I get this error do I misunderstand | vs || and & vs &&?

Jacob Ian
  • 669
  • 3
  • 9
  • 17

2 Answers2

4

Use case_when

library(dplyr)
d %>%
    mutate(value = case_when(
        value <= 0.1 & Category %in% c("B", "C") ~ NaN,
        TRUE ~ value
    ))
d.b
  • 32,245
  • 6
  • 36
  • 77
  • Yes, the chosen answer better answers the question, but `case_when` is infinitely more readable and less prone to bugs in my experience and should be used for anything more complex than `if_else( a > 5, "big", "small")`, imo. That said, isn't the convention to reserve NA for the `TRUE` (everything else is `FALSE`) result by convention? – GenesRus Oct 08 '19 at 20:14
2

Here, the issue is the use of || which returns a single TRUE/FALSE output instead of |. According to ?"||"

& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

library(dplyr)
Mydata %>% 
       mutate(value2= if_else(((!is.na(value)) &
                               (value<=0.1) & 
                               (Category=='B')) |
                               ((!is.na(value)) &
                               (OPD_PTNT_PCNT_out<=0.1) & 
                               (Category=='C')),
                               NA_real_, value ) )
akrun
  • 874,273
  • 37
  • 540
  • 662