6

I'm sure this has a simple answer, but how does one compare two string and ignore case in Julia? I've hacked together a rather inelegant solution:

function case_insensitive_match{S<:AbstractString}(a::S,b::S)
    lowercase(a) == lowercase(b)
end

There must be a better way!

digbyterrell
  • 3,449
  • 2
  • 24
  • 24
  • 6
    Actually, it looks a universal approach, what would you expect? – Wiktor Stribiżew Sep 08 '16 at 20:02
  • if you want a more "c-like" string comparison returning -1, 0, or 1, then use `cmp` instead of `==`. But otherwise, yes, this seems like the best way to do a case-insensitive comparison. Not inelegant at all. – Tasos Papastylianou Sep 08 '16 at 21:02
  • 2
    Strings are a messy business. When comparing strings a _good_ solution would not a) allocate, b) compare beyond first difference. But most important is correctness (and flexibility to adhere to desired level of pedantry). Yeah even correctness is hard to define, strings are messy but this is _important_ for programmers and Julia. – Dan Getz Sep 09 '16 at 15:32

1 Answers1

7

Efficiency Issues

The method that you have selected will indeed work well in most settings. If you are looking for something more efficient, you're not apt to find it. The reason is that capital vs. lowercase letters are stored with different bit encoding. Thus it isn't as if there is just some capitalization field of a character object that you can ignore when comparing characters in strings. Fortunately, the difference in bits between capital vs. lowercase is very small, and thus the conversions are simple and efficient. See this SO post for background on this:

How do uppercase and lowercase letters differ by only one bit?

Accuracy Issues

In most settings, the method that you have will work accurately. But, if you encounter characters such as capital vs. lowercase Greek letters, it could fail. For that, you would be better of with the normalize function (see docs for details) with the casefold option:

normalize("ad", casefold=true)

See this SO post in the context of Python which addresses the pertinent issues here and thus need not be repeated:

How do I do a case-insensitive string comparison?

Since it's talking about the underlying issues with utf encoding, it is applicable to Julia as well as Python.

See also this Julia Github discussion for additional background and specific examples of places where lowercase() can fail:

https://github.com/JuliaLang/julia/issues/7848

wueli
  • 951
  • 11
  • 19
Michael Ohlrogge
  • 10,559
  • 5
  • 48
  • 76