3

Consider the following code:

library(dplyr)       
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes, status)
myf <- function(patientID, age, diabetes, status) { isTRUE(all.equal(age, 34))}
mutate(patientdata, isAge34 = myf(patientID, age, diabetes, status))

I wrote myf to return TRUE for the row where age == 34, but this doesn't work:

  patientID age diabetes    status isAge34
1         1  25    Type1      Poor   FALSE
2         2  34    Type2  Improved   FALSE
3         3  28    Type1 Excellent   FALSE
4         4  52    Type1      Poor   FALSE

Why didn't this work? Did I doing something wrong?


EDIT: This is a contrived, simplified example. In reality, I have much more complicated logic inside of my function.

My motivation for asking:

  • I thought that I was supposed to prefer isTRUE(all.equal()) over == because that's the R way of doing things.

Reference 1, Reference 2:

For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable. See the examples.

Community
  • 1
  • 1
Jim G.
  • 15,141
  • 22
  • 103
  • 166
  • 6
    Your function doesn't make any sense. If you want to compare `age` why are you passing all the rest of the columns? If this a function, why did you hardcoded the variable names in the first place? Either you could just do `patientdata %>% mutate(age == 34)` or `myf <- function(x) x == 34 ; patientdata %>% mutate(myf(age))`. – David Arenburg Sep 28 '16 at 12:33
  • 2
    I agree with @DavidArenburg but as far as your function goes, it will work if you `Vectorize` it (`myf <- Vectorize(myf)`) – Sotos Sep 28 '16 at 12:35
  • @DavidArenburg: But I thought that `isTRUE(all.equal())` was preferred over `==`. – Jim G. Sep 28 '16 at 12:35
  • 2
    @JimG. 1- `all.equal()` is not vectorized- see Sotos comment. 2- Do you have any reference for that statement (I would love to see it)? You basically could stick with your approach (if efficiency doesn't matter) but *you need to write the function in a reasonable manner*, for instance `myf <- function(x) isTRUE(all.equal(x, 34)) ; patientdata %>% rowwise() %>% mutate(myf(age))` – David Arenburg Sep 28 '16 at 12:40
  • @DavidArenburg: I just edited my question and cited two references. All in all, I'm not really fussy about a certain approach. I'm just trying to do things "the R way". – Jim G. Sep 28 '16 at 12:47
  • 1
    @JimG. You are talking about floating points, not integers (like in your case). And the main problem not with the general approach (which can be used as pointed in my comment)- rather with logic of your function - which isn't a function really – David Arenburg Sep 28 '16 at 12:50
  • @DavidArenburg: OK. So if they were floating point numbers, I suppose that I would need to use an alternative approach (like the one you described with `rowwise()`)? – Jim G. Sep 28 '16 at 12:52
  • I think setting some tolerance level and keep it vectorized is a better approach. See [here](http://stackoverflow.com/questions/2769510/numeric-comparison-difficulty-in-r). Though I'm not an authority in any matter, so it's up to you to decide. My main complaint is regarding your code- not the actual approach – David Arenburg Sep 28 '16 at 12:56
  • 1
    @DavidArenburg: Very interesting. Thanks a lot for your help. – Jim G. Sep 28 '16 at 12:59

2 Answers2

6

The point of all.equal is to check for near equality, most commonly for use with floating point numbers. Comparisons with == will not give reliable results for floating point numbers (the link provided by @Andew's comment explains this). Therefore the accepted answer is not actually a correct solution, because the dataframe described in the original post specifies the age variable as numeric (not integer!).

dplyr provides the near function, which is basically a vectorized version of all.equal that works with mutate.

mutate(patientdata, isAge34 = near(age, 34))
mikeck
  • 3,534
  • 1
  • 26
  • 39
2

As @DavidArenburg said, all.equal() is not vectorized.

The following code will work:

mutate(patientdata, isAge34 = age == 34)
Community
  • 1
  • 1
Jim G.
  • 15,141
  • 22
  • 103
  • 166
  • I don't think defining this function is necessary (or reasonable- the only purpose of the function is to check the equality of it argument to a fixed value). @DavidArenburg suggestion above of using `mutate(patientdata, isAge34 = age == 34)` works and is a lot simpler. – ddiez Oct 04 '16 at 14:54
  • @ddiez: I agree. Please see my revision. – Jim G. Oct 04 '16 at 14:58
  • 4
    This works for integers but you can fall into the classic floating point trap! http://www.burns-stat.com/pages/Tutor/R_inferno.pdf – Andrew Aug 04 '17 at 14:59