8

When I studied java, I noticed that we avoid using == to compare reference types because == compares whether the references are the same, not the contents. And we would only use == for primitive types due to the way they're being stored in the memory.

I found a kinda similar note in R documentation for Relational Operators:

Do not use == and != for tests, such as in if expressions, where you must get a single TRUE or FALSE. Unless you are absolutely sure that nothing unusual can happen, you should use the identical function instead.

And immediately after this, I found:

For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable.


My humble questions:

1. Do we talk about primitive types in R? If so, what are they? And can we always safely use relational operators to compare them? (furthermore, under what situations could we be "absolutely sure that nothing unusual can happen" when using relational operators?)

2. I saw R codes comparing strings (characters) using == many times, are those R codes I saw just being sloppy or is that because character/string a primitive type in R (or something that we could always use relational operators to compare)?


[Updates]

Thanks for the comments below, I realized that the upper quote above is mainly trying to emphasize the vectorizing feature of R‘s operations rather than the accuracy of the output, and the validity of the relational operations in (base) R is highly unlikely to be affected by issues related to reference types.

Any answers or comments for further explanation/clarification are heartily welcomed.

J-A-S
  • 368
  • 1
  • 8
  • 4
    That "Do not use" paragraph you quoted is mostly about the fact that in R, `==` operator is vectorized, i.e. `c(1,2,3) == c(1,2,3)` will not return `TRUE`, it will return a vector like `c(TRUE,TRUE,TRUE)`. And `if(c(1,2,3) == c(1,2,3)) {}` construct will raise a warning for the same reason. – Vasily A Oct 31 '20 at 11:32
  • 3
    Also, check [this](https://stackoverflow.com/a/9508558/10802499) for more explanations about R's floating-point representation. – ekoam Oct 31 '20 at 12:35
  • 1
    @ekoam @Vasily A Thank you very much for pointing those out! So may I ask if we would need to worry about things like "how reference types behave with respect to `==` operator" in R programming? – J-A-S Oct 31 '20 at 12:47
  • 4
    At least in base R, there is no such thing (AFAIK) as comparison by reference. `x == y` gives you a TRUE when they are equal even if they have different addresses. However, you shouldn't treat this operator as Java's `==` without reference comparison. This operator differs greatly from that one in Java. Note that `0 == "0"` in R also gives you a TRUE because of type coercion. – ekoam Oct 31 '20 at 13:08
  • I never realized `==` is vectorized. TIL. Thank you for asking this question. – Dunois Oct 31 '20 at 14:19
  • 1
    @Dunois Not 100% sure but I think all calculations in R tend to be vectorized in order to minimize the need for loops. Anyway, good to hear that this question is helpful for others :) – J-A-S Oct 31 '20 at 14:32
  • 2
    adding to @VasilyA 's comment a workaround could be using the `all` function that takes as argument a logical vector and returns `TRUE` if all the elements are `TRUE` (basically `&`ing all the vectors elements) also might be interesting to take a look at the `any` function that returns `TRUE` if any element is `TRUE` basically `|`ing the elements of the logical vector – Abdessabour Mtk Oct 31 '20 at 14:39

1 Answers1

3

References or pointers are not used in R. Therefore, a distinction between primitive types and reference types is not meaningful, and the pitfalls known, e.g., from Java or C++ of comparing a reference address instead of its content don't occur in R.

The warnings regarding the use of == for comparisons in R refer mainly to problems related to finite accuracy that can arise when checking floating-point values for equality, which can lead to counterintuitive results. Other possible mistakes connected with the use of the == operator in R concern the elementwise (rather than global) comparison of vectors, as was mentioned in the comments.

There are of course different "primitive" types (called data types or "atomic" modes) in R, such as numeric, logical, character, and complex. But it is usually not even necessary to specify the type of a variable because it is automatically deducted by the assignment.

RHertel
  • 23,412
  • 5
  • 38
  • 64
  • Thank you for your answer! This may not be relevant to the question, but following what you said in the last paragraph, may I ask, in R, what are the differences between *type* and *class*? (as we have the *typeof()* function and the *class()* function and the outputs can be different.) I did look this up, but haven't found a satisfactory answer that is *practical* enough... – J-A-S Nov 01 '20 at 08:33
  • No worries, thanks for your explanations :) I'll probably do more research on these and ask in a separate question if needed – J-A-S Nov 01 '20 at 09:03
  • @J-A-S That sounds like a good idea. If you don't mind, I'll remove my last comment. I feel that it's not accurate enough, and I'm not even sure it is correct. – RHertel Nov 01 '20 at 09:11
  • Sure, but anyway I think that was insightful haha. I'll @ you (probably here) to let you know when I have a certain answer for that one day – J-A-S Nov 01 '20 at 11:19