7

I'm currently taking the course: Introduction to R on DataCamp and in one exercise (Battle of the sexes) there is an instruction like this:

Read the code in the editor and click 'Submit Answer' to test if male is greater than (>) female

The above instruction inspired me to test the following code in RStudio:

'Male' > 'Female'

To my surprise, R gave me the output TRUE! I also tried in Excel and VBA, and both came up with outputs TRUE, too! Now, I begin to think that they're programming languages with sexist views (Just kidding, hehe...).

enter image description here

So I wonder, what really happened here? Could anyone here explain it to me? Does this hold TRUE, too, for other programming languages? Why?

  • 22
    actually you are comparing "texts. in Alphabets M comes after F. thats why it is giving true. – Zaid Mirza Mar 06 '18 at 07:46
  • 1
    In PHP you get the same result, as explained by Zaid it's a matter of letters and ASCII table. But you can do like this: https://3v4l.org/ZasdU , this is because ASCII table has capital letters first thus "f" is larger than "M" – Andreas Mar 06 '18 at 07:54
  • I'm new in programming but why people here downvote my question? Am I asking a wrong question here? – Anastasiya-Romanova 秀 Mar 06 '18 at 08:07
  • 3
    @Anastasiya-Romanova秀 It can be for various reason, I didn't downvote, but if I did, I would do that because question is realy broad (as you asked about string comparement across all programming languages) or because it lacks proper research (I'm not google-fuu master, but I found relevant info pretty fast). Though the reasearch part is a bit of unfair, since you DID the research by comparing strings in 3 different languages. Some could downvote because of images of code instead of code snippets (lol). – AntiDrondert Mar 06 '18 at 08:10
  • @AntiDrondert I believe English websites have many resources to help me to answer this question but English is not my first language so sometimes it's hard for me to find suitable keywords to input it on Google. I did try but to no avail. – Anastasiya-Romanova 秀 Mar 06 '18 at 08:22
  • 6
    "Male" > "Female", but also "Mother" > "Father" and "Son" > "Daughter", but "Girl" > "Boy". So we can safely assume, that computers are as confused with gender as we are. – Toon Krijthe Mar 06 '18 at 11:23
  • 1
    Possible duplicate of [Why is one string greater than the other when comparing strings in JavaScript?](https://stackoverflow.com/questions/7087811/why-is-one-string-greater-than-the-other-when-comparing-strings-in-javascript) – phuclv Mar 06 '18 at 11:49
  • [How to explain sorting (numerical, lexicographical and collation) with examples to non technical testers?](https://stackoverflow.com/q/6810619/995714), [Definition of a lexicographical order](https://stackoverflow.com/q/47478926/995714), [comparing 2 strings alphabetically](https://stackoverflow.com/q/10198257/995714) better dupe: [String Compare “Logic”](https://stackoverflow.com/q/1863028/995714) – phuclv Mar 06 '18 at 11:52
  • 4
    Possible duplicate of [String Compare "Logic"](https://stackoverflow.com/questions/1863028/string-compare-logic) – ederag Mar 06 '18 at 11:58

4 Answers4

16

For R, see help('>') or its documentation here, and the wikipedia link about collation:

"Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z."

So summarizing; in your locale, the value of 'F' is smaller than the value of 'M' in the collation sequence, and thus Mxxx is larger than Fyyy.

Florian
  • 24,425
  • 4
  • 49
  • 80
  • 2
    Did you mean "...and thus `Fyyy` is smaller than `Myyy`", or perhaps "...and thus `Myyy` is larger than `Fyyy`"? – andrew Mar 06 '18 at 08:04
  • 1
    Well-explained that in R, it is _not_ a lexicographic sort, as it is in some other environments. – Tom Blodget Mar 06 '18 at 11:52
4

In other languages like C# you can't compare strings with

   "Male" > "Female"
Simon H.
  • 131
  • 9
4

VBA for example converts the first Letter to ASCII and then compares it.

MsgBox Asc("male") '= 109
MsgBox Asc("female") '= 102
MsgBox Asc("Male") '= 77
MsgBox Asc("Female") '= 70

This is why it says "male" > "female" is true. But "Male" > "female" is false.

For the other languages it will be similiar

sporc
  • 387
  • 1
  • 4
  • 14
  • 3
    This is correct except for the confusion about ASCII. In VBA (VB4, .NET, Java, JavaScript, …) strings are sequences of UTF-16 code units, one or two encoding a Unicode codepoint. The AscW function just takes the first UTF-16 code unit. The Asc function takes the first UTF-16 code unit and transcodes it to the user's machines' current default ANSI encoding (which won't be ASCII). For a consistent character code across machines, users and time, programs use AscW. – Tom Blodget Mar 06 '18 at 12:11
2

In “less flexible” programming languages, you can’t use the “>” or “<“ operators to compare strings.

In “more flexible” programming languages such as VBA where you can write:

b = “3”
a = 5 + b
>> a = 8 (implicit conversion of string to number)

... you get to evaluate strings by ordinals (numeric values associated to letters in ASCII tables) when applying the larger or smaller operator. And since “M” in the alphabet comes after than “F” (having a higher ordinal), the strings comparison gives you that result.

If you want it more feminist, you can compare “Madame” (woman in French) and “Hombre” (man in Spanish) :)

Matteo NNZ
  • 11,930
  • 12
  • 52
  • 89