Why doesn't all.equal work within dplyr's mutate function?

Question

Consider the following code:

library(dplyr)       
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes, status)
myf <- function(patientID, age, diabetes, status) { isTRUE(all.equal(age, 34))}
mutate(patientdata, isAge34 = myf(patientID, age, diabetes, status))

I wrote myf to return TRUE for the row where age == 34, but this doesn't work:

  patientID age diabetes    status isAge34
1         1  25    Type1      Poor   FALSE
2         2  34    Type2  Improved   FALSE
3         3  28    Type1 Excellent   FALSE
4         4  52    Type1      Poor   FALSE

Why didn't this work? Did I doing something wrong?

EDIT: This is a contrived, simplified example. In reality, I have much more complicated logic inside of my function.

My motivation for asking:

I thought that I was supposed to prefer isTRUE(all.equal()) over == because that's the R way of doing things.

Reference 1, Reference 2:

For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable. See the examples.

Your function doesn't make any sense. If you want to compare `age` why are you passing all the rest of the columns? If this a function, why did you hardcoded the variable names in the first place? Either you could just do `patientdata %>% mutate(age == 34)` or `myf <- function(x) x == 34 ; patientdata %>% mutate(myf(age))`. — David Arenburg, Sep 28 '16 at 12:33
I agree with @DavidArenburg but as far as your function goes, it will work if you `Vectorize` it (`myf <- Vectorize(myf)`) — Sotos, Sep 28 '16 at 12:35
@DavidArenburg: But I thought that `isTRUE(all.equal())` was preferred over `==`. — Jim G., Sep 28 '16 at 12:35
@JimG. 1- `all.equal()` is not vectorized- see Sotos comment. 2- Do you have any reference for that statement (I would love to see it)? You basically could stick with your approach (if efficiency doesn't matter) but *you need to write the function in a reasonable manner*, for instance `myf <- function(x) isTRUE(all.equal(x, 34)) ; patientdata %>% rowwise() %>% mutate(myf(age))` — David Arenburg, Sep 28 '16 at 12:40
@DavidArenburg: I just edited my question and cited two references. All in all, I'm not really fussy about a certain approach. I'm just trying to do things "the R way". — Jim G., Sep 28 '16 at 12:47
@JimG. You are talking about floating points, not integers (like in your case). And the main problem not with the general approach (which can be used as pointed in my comment)- rather with logic of your function - which isn't a function really — David Arenburg, Sep 28 '16 at 12:50
@DavidArenburg: OK. So if they were floating point numbers, I suppose that I would need to use an alternative approach (like the one you described with `rowwise()`)? — Jim G., Sep 28 '16 at 12:52
I think setting some tolerance level and keep it vectorized is a better approach. See [here](http://stackoverflow.com/questions/2769510/numeric-comparison-difficulty-in-r). Though I'm not an authority in any matter, so it's up to you to decide. My main complaint is regarding your code- not the actual approach — David Arenburg, Sep 28 '16 at 12:56
@DavidArenburg: Very interesting. Thanks a lot for your help. — Jim G., Sep 28 '16 at 12:59

score 6 · Answer 1 · answered Feb 26 '18 at 20:00

The point of all.equal is to check for near equality, most commonly for use with floating point numbers. Comparisons with == will not give reliable results for floating point numbers (the link provided by @Andew's comment explains this). Therefore the accepted answer is not actually a correct solution, because the dataframe described in the original post specifies the age variable as numeric (not integer!).

dplyr provides the near function, which is basically a vectorized version of all.equal that works with mutate.

mutate(patientdata, isAge34 = near(age, 34))

score 2 · Accepted Answer · edited May 23 '17 at 10:34

2

As @DavidArenburg said, all.equal() is not vectorized.

The following code will work:

mutate(patientdata, isAge34 = age == 34)

edited May 23 '17 at 10:34

Community

1
1

answered Oct 04 '16 at 14:36

Jim G.

15,141
22
103
166

I don't think defining this function is necessary (or reasonable- the only purpose of the function is to check the equality of it argument to a fixed value). @DavidArenburg suggestion above of using `mutate(patientdata, isAge34 = age == 34)` works and is a lot simpler. – ddiez Oct 04 '16 at 14:54
@ddiez: I agree. Please see my revision. – Jim G. Oct 04 '16 at 14:58
4

This works for integers but you can fall into the classic floating point trap! http://www.burns-stat.com/pages/Tutor/R_inferno.pdf – Andrew Aug 04 '17 at 14:59

Why doesn't all.equal work within dplyr's mutate function?

2 Answers2

Linked