17

I'm reading Hadley Wickham's Advanced R section on coercion, and I can't understand the result of this comparison:

"one" < 2
# [1] FALSE

I'm assuming that R coerces 2 to a character, but I don't understand why R returns FALSE instead of returning an error. This is especially puzzling to me since

-1 < "one"
# TRUE

So my question is two-fold: first, why this answer, and second, is there a way of seeing how R converts the individual elements within a logical vector like these examples?

Henrik
  • 65,555
  • 14
  • 143
  • 159
JoeF
  • 733
  • 1
  • 7
  • 21
  • 7
    hint: `sort(c("one","2","-1","10"))` (the 10 is not necessary but might give you another example to ponder: `"10"<"2"`) – Ben Bolker Nov 18 '14 at 22:28
  • 1
    this kind of quirks appear in all programming languages, try to always take care of the data type that you are working with and compare apples with apples and oranges with oranges. – Elzo Valugi Nov 18 '14 at 22:34
  • 2
    Coercion is widespread in R. It's a weakly typed language. If you don't like that feature, then use Java. When you are looking at characters with"<", it is initially a comparison of locale-specific collation order of the first character to the first character. Look at : ..... `'\t9999' < " 00000"` – IRTFM Nov 18 '14 at 22:38
  • Thanks for the hint. I can see how "sort" reveals what the answer would be, but I still don't understand how sort itself is working. (I tried the help for sort, but didn't find an explanation there for this particular issue).(OK, based on BondedDust, I see why "10" < "2". – JoeF Nov 18 '14 at 22:41
  • Related: http://stackoverflow.com/questions/14932015/why-does-true-true-in-r/14932160#14932160; http://stackoverflow.com/questions/18964562/why-does-1-99-999-1-99-999-in-r-but-100-000-100-000/18964626#18964626 – Henrik Nov 18 '14 at 22:42
  • In my locale it's just a lexical sort (alphabetic) using the ASCII numbers underlying the characters. That's why I gave the example with the first character as a tab. (But I do see that some of those low ASCII values are not handled as I would have expected.) The help page for `?Comparison` cites this as the reference for R's behavior: http://site.icu-project.org/ – IRTFM Nov 18 '14 at 22:49

2 Answers2

17

From help("<"):

If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.

So in this case, the numeric is of lower precedence than the character. So 2 is coerced to the character "2". Comparison of strings in character vectors is lexicographic which, as I understand it, is alphabetic but locale-dependent.

Chris Cirefice
  • 5,475
  • 7
  • 45
  • 75
jdharrison
  • 30,085
  • 4
  • 77
  • 89
9

It coerces 2 into a character, then it does an alphabetical comparison. And numeric characters are assumed to come before alphabetical ones

to get a general idea on the behavior try

'a'<'1'
'1'<'.'
'b'<'B'
'a'<'B'
'A'<'B'
'C'<'B'
rawr
  • 20,481
  • 4
  • 44
  • 78
OganM
  • 2,543
  • 16
  • 33
  • 1
    I wonder if there's any general (i.e., universal across locales) guarantee that numbers come before alphabetical characters in the collation order? – Ben Bolker Nov 18 '14 at 22:40
  • Thanks. I would mark this one and the one by jdharrison as both being correct, but it appears I can only mark one of them that way. – JoeF Nov 18 '14 at 22:51