6

To compare two Strings, ignoring case, it looks like I first need to convert to a lower case version of the string:

let a_lower = a.to_lowercase();
let b_lower = b.to_lowercase();
a_lower.cmp(&b_lower)

Is there a method that compares strings, ignoring case, without creating the temporary lower case strings, i.e. that iterates over the characters, performs the to-lowercase conversion there and compares the result?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Thomas S.
  • 5,804
  • 5
  • 37
  • 72

3 Answers3

6

If you are only working with ASCII, you can use eq_ignore_ascii_case:

assert!("Ferris".eq_ignore_ascii_case("FERRIS"));
Ibraheem Ahmed
  • 11,652
  • 2
  • 48
  • 54
  • If it is about strings, e.g. from user input or file names, this does not help. How often strings are only within the application and fully under programmer's control? – Thomas S. Jan 31 '21 at 11:56
  • @ThomasS. True, but it is still a valid answer and may be useful to some people. – Ibraheem Ahmed Jan 31 '21 at 15:37
  • 2
    This was exactly what I needed. I am comparing against a set of constant ASCII strings against whatever the user input is, so if that's not ASCII, then it wouldn't be a match anyway. – Steven Turner Dec 24 '22 at 00:46
4

There is no built-in method, but you can write one to do exactly as you described, assuming you only care about ASCII input.

use itertools::{EitherOrBoth::*, Itertools as _}; // 0.9.0
use std::cmp::Ordering;

fn cmp_ignore_case_ascii(a: &str, b: &str) -> Ordering {
    a.bytes()
        .zip_longest(b.bytes())
        .map(|ab| match ab {
            Left(_) => Ordering::Greater,
            Right(_) => Ordering::Less,
            Both(a, b) => a.to_ascii_lowercase().cmp(&b.to_ascii_lowercase()),
        })
        .find(|&ordering| ordering != Ordering::Equal)
        .unwrap_or(Ordering::Equal)
}

As some comments below have pointed out, case-insensitive comparison is not going to work properly for UTF-8, without operating on the whole string, and even then there are multiple representations of some case conversions, which could give unexpected results.

With those caveats, the following will work for a lot of extra cases compared with the ASCII version above (e.g. most accented Latin characters) and may be satisfactory, depending on your requirements:

fn cmp_ignore_case_utf8(a: &str, b: &str) -> Ordering {
    a.chars()
        .flat_map(char::to_lowercase)
        .zip_longest(b.chars().flat_map(char::to_lowercase))
        .map(|ab| match ab {
            Left(_) => Ordering::Greater,
            Right(_) => Ordering::Less,
            Both(a, b) => a.cmp(&b),
        })
        .find(|&ordering| ordering != Ordering::Equal)
        .unwrap_or(Ordering::Equal)
}
Peter Hall
  • 53,120
  • 14
  • 139
  • 204
  • 3
    Any method that uses `str::chars` will not compare unicode strings properly. – mcarton Sep 13 '20 at 15:40
  • To complement what I believe mcarton is talking about: `str::chars` iterates codepoints, but because of *precomposition* it's possible to have strings which are *canonically equivalent* but have different contents at a technical level. `chars` will not take that information in account. An other large issue is that case conversion is locale-dependent e.g. the lowercase of `I` is `i`… unless you're in turkic where it's ı. I'm sure there are other pitfalls there. – Masklinn Sep 13 '20 at 16:07
  • I'm not trying to compare for equality, but for <, == or >. BTW, is `string.to_lowercase()` considering the locale, so that one char could become multiple chars? – Thomas S. Sep 13 '20 at 17:33
  • @ThomasS. please read [Why is capitalizing the first letter of a string so convoluted in Rust?](https://stackoverflow.com/q/38406793/155423) – Shepmaster Sep 14 '20 at 11:57
  • 2
    "assuming you only care about ASCII input" is awfully bad practice these days. – Jack Aidley Sep 15 '20 at 09:33
1

UNICODE

The best way for supporting UNICODE is using to_lowercase() or to_uppercase().

This is because UNICODE has many caveats and these functions handles most situations. There are some locale specific strings not handled correctly.

let left = "first".to_string();
let right = "FiRsT".to_string();

assert!(left.to_lowercase() == right.to_lowercase());

Efficiency

It is possible to iterate and return on first non-equal character, so in essence you only allocate one character at a time. However iterating using chars function does not account for all situations UNICODE can throw at us.

See the answer by Peter Hall for details on this.

ASCII

Most efficient if only using ASCII is to use eq_ignore_ascii_case (as per Ibraheem Ahmed's answer). This is does not allocate/copy temporaries.

This is only good if your code controls at least one side of the comparison and you are certain that it will only include ASCII.

assert!("Ferris".eq_ignore_ascii_case("FERRIS"));

Locale

Rusts case functions are best effort regarding locales and do not handle all locales. To support proper internationalisation, you will need to look for other crates that do this.

Steven Turner
  • 136
  • 1
  • 4