372

What is the best way of doing case-insensitive string comparison in C++ without transforming a string to all uppercase or all lowercase?

Please indicate whether the methods are Unicode-friendly and how portable they are.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Adam
  • 25,966
  • 23
  • 76
  • 87
  • In c, one usually was forced toupper the whole string then compare that way - or roll your own compare :P – Michael Dorgan May 22 '10 at 01:10
  • @[Adam](#11679): While this variant is good in terms of usability it's bad in terms of performance because it creates unnecessary copies. I might overlook something but I believe the best (non-Unicode) way is to use `std::stricmp`. Otherwise, read what Herb [has to say](http://www.gotw.ca/gotw/029.htm). – Konrad Rudolph Aug 26 '08 at 12:17
  • a later question has a simpler answer: strcasecmp (at least for BSD & POSIX compilers) http://stackoverflow.com/questions/9182912/case-insensitive-string-comparison-c – Móż Nov 05 '13 at 21:39
  • @Mσᶎ this question also has that answer, with the important caveat that `strcasecmp` is not part of the standard and is missing from at least one common compiler. – Mark Ransom Dec 01 '14 at 19:57

30 Answers30

331

Boost includes a handy algorithm for this:

#include <boost/algorithm/string.hpp>
// Or, for fewer header dependencies:
//#include <boost/algorithm/string/predicate.hpp>

std::string str1 = "hello, world!";
std::string str2 = "HELLO, WORLD!";

if (boost::iequals(str1, str2))
{
    // Strings are identical
}
Josh Kelley
  • 56,064
  • 19
  • 146
  • 246
Rob
  • 76,700
  • 56
  • 158
  • 197
  • 16
    Is this UTF-8 friendly? I think not. – vladr Oct 30 '10 at 00:23
  • 20
    No, because UTF-8 allows identical strings to be coded with different binary codes, due to accents, combines, bidi issues, etc. – vy32 Jun 18 '11 at 23:35
  • 11
    @vy32 That is absolutely incorrect! The UTF-8 combinations are mutually exclusive. It must always use shortest possible representation, if it does not, it's a malformed UTF-8 sequence or code point that must be treated with care. – Wiz Nov 10 '11 at 23:44
  • 56
    @Wiz, you are ignoring the issue of Unicode string normalization. ñ can be represented as a combining ˜ followed by an n, or with a ñ character. You need to use Unicode string normalization before performing the comparaison. Please review Unicode Technical Report #15, http://unicode.org/reports/tr15/ – vy32 Nov 11 '11 at 03:21
  • 7
    @vy32 (I never followed back on that comment) but that still doesn't mean it's not UTF-8 friendly. Comparison should undergo normalization to either fully decomposed or fully composed to eliminate such equivalence issues. Nevertheless, nothing stops that iequals function from doing that. – Wiz Jan 22 '12 at 22:36
  • 1
    Actually I'm not sure if std::basic_string is suitable for variable length encoding like UTF8. Sure it will do something. In many cases (in particular in ASCII subset) it will do properly. But I think std::basic_string (and thus all its uses) may assume fixed length encoding. But maybe its the Traits template argument that has to deal with it. – Adam Badura Jun 09 '12 at 23:42
  • There are two separate issues to look at: conversion between UTF-32 and UTF-8 (fixed and deterministic) and composition of a string out of UTF-32 code points (neither fixed nor deterministic). – Jerry Coffin Mar 21 '13 at 14:59
  • 6
    It will not work for Unicode in the general case. "ß" and "SS" should compare equal, but Boost String Algorithms doesn't handle this. – dalle Jun 05 '13 at 11:43
  • 4
    @dalle Why should "ß" and "SS" compare equal? In what usecases? Most don't want "ß" in Switzerland f.e. – wonko realtime Dec 10 '13 at 16:15
  • 14
    @wonkorealtime: because "ß" converted to uppercase is "SS": http://www.fileformat.info/info/unicode/char/df/index.htm – Mooing Duck May 29 '14 at 23:11
  • Slight problem, "#include " didn't work because it couldn't resolve the predicate, while the uncommented version did. How do I correctly utilize the minimal header dependency version? – Andrew Hundt Jun 30 '15 at 18:57
  • 4
    Also note, that casing can be language specific as well, even in Unicode. i.e. In Turkish, uppercase of U+0069 (lowercase i) is U+0130 (uppercase I with dot) and not U+0049 (Uppercase I). There is not many of them, but ftp://unicode.org/Public/UNIDATA/SpecialCasing.txt – Rahly Jun 12 '17 at 10:18
  • Definitely **not** Unicode-friendly. Another example is Greek uppercase 'Σ' that converts to either lowercase 'σ' or 'ς' depending on word position. `boost::iequals` defers to `std::locale()` which is unable to handle these things. Anything not ICU is, at this point of writing, lying through its teeth. – DevSolar Feb 07 '20 at 12:57
  • 2
    OP didn't ask how to do it with Boost. – Craig B Nov 23 '20 at 23:41
  • Caveat: On MSVC at least, boost::iequals() calls std::facet(), which invokes a mutex lock, and is orders of magnitude slower than a custom comparator, especially in a multithreaded environment. – Charles Savoie Aug 05 '21 at 15:56
128

The trouble with boost is that you have to link with and depend on boost. Not easy in some cases (e.g. android).

And using char_traits means all your comparisons are case insensitive, which isn't usually what you want.

This should suffice. It should be reasonably efficient. Doesn't handle unicode or anything though.

bool iequals(const string& a, const string& b)
{
    unsigned int sz = a.size();
    if (b.size() != sz)
        return false;
    for (unsigned int i = 0; i < sz; ++i)
        if (tolower(a[i]) != tolower(b[i]))
            return false;
    return true;
}

Update: Bonus C++14 version (#include <algorithm>):

bool iequals(const string& a, const string& b)
{
    return std::equal(a.begin(), a.end(),
                      b.begin(), b.end(),
                      [](char a, char b) {
                          return tolower(a) == tolower(b);
                      });
}

Update: C++20 version using std::ranges:

#include <ranges>
#include <algorithm>
#include <string_view>

bool iequals(std::string_view lhs, std::string_view rhs) {
    auto to_lower{ std::ranges::views::transform(static_cast<int(*)(int)>(std::tolower)) };
    return std::ranges::equal(lhs | to_lower, rhs | to_lower);
}
Donald Duck
  • 8,409
  • 22
  • 75
  • 99
Timmmm
  • 88,195
  • 71
  • 364
  • 509
  • 32
    Actually, the boost string library is a header only library, so there is no need to link to anything. Also, you can use boost's 'bcp' utility to copy just the string headers to your source tree, so you don't need to require the full boost library. – Gretchen Mar 09 '11 at 21:47
  • Ah I did not know about bcp, it looks really useful. Thanks for the info! – Timmmm Mar 13 '11 at 18:15
  • 11
    Good to know a simple and non-boost-dependency version. – Deqing May 17 '14 at 03:31
  • 2
    @Anna Text library of boost needs to be built and link. It uses IBM ICU. – Behrouz.M Jun 01 '15 at 06:46
  • Also available with C++11 – martian Jun 21 '18 at 18:06
  • `std::equal` is not available in C++11. – Timmmm Jun 22 '18 at 18:47
  • 6
    `std::tolower` should [not](https://en.cppreference.com/w/cpp/string/byte/tolower#Notes) be called on `char` directly, a `static_cast` to `unsigned char` is needed. – Evg Sep 26 '20 at 09:50
  • @Evg That's pretty crazy. Fortunately this seems to be one of those "technically undefined behaviour but everyone does the sane thing anyway" so de facto it's probably fine. I checked GCC, Clang and ICC with `-O3` and they all do the sane thing. I imagine if you made a compiler that didn't it wouldn't be able to compile a lot of existing code. – Timmmm Mar 02 '21 at 13:54
  • 1
    In the C++14 version, it suffices to change to parameter list of the lambda function to `[](unsigned char a, unsigned char b)`, no `static_cast` is necessary. – Jonatan Lindén Sep 16 '21 at 11:08
  • 2
    @Timmmm I've taken the liberty of adding a C++20 version to this answer as I believe here is the best fit, and compared with other answers in this thread, I feel most closely resembles your other solutions. – Ben Cottrell Jan 02 '22 at 11:39
  • 1
    (Late) side note: There are quite a number of languages having different representations of small letters for capital ones (e.g. Greek `Σ` maps to `ς` at word's end and to `σ` elsewhere). While the other way round exists, too, (was that in Turkish?) this case is rarer, so chances to get correct comparison is greater with `toupper` – sure, doesn't help out if you happen to encode exactly one of the counter-example languages ;) – Aconcagua Jul 06 '22 at 16:00
  • Interesting, but that only applies to Unicode (in which case there are probably official algorithms for this). This code is just ASCII. – Timmmm Jul 07 '22 at 13:00
124

Take advantage of the standard char_traits. Recall that a std::string is in fact a typedef for std::basic_string<char>, or more explicitly, std::basic_string<char, std::char_traits<char> >. The char_traits type describes how characters compare, how they copy, how they cast etc. All you need to do is typedef a new string over basic_string, and provide it with your own custom char_traits that compare case insensitively.

struct ci_char_traits : public char_traits<char> {
    static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
    static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
    static bool lt(char c1, char c2) { return toupper(c1) <  toupper(c2); }
    static int compare(const char* s1, const char* s2, size_t n) {
        while( n-- != 0 ) {
            if( toupper(*s1) < toupper(*s2) ) return -1;
            if( toupper(*s1) > toupper(*s2) ) return 1;
            ++s1; ++s2;
        }
        return 0;
    }
    static const char* find(const char* s, int n, char a) {
        while( n-- > 0 && toupper(*s) != toupper(a) ) {
            ++s;
        }
        return s;
    }
};

typedef std::basic_string<char, ci_char_traits> ci_string;

The details are on Guru of The Week number 29.

wilhelmtell
  • 57,473
  • 20
  • 96
  • 131
  • 15
    As far as I know from my own experimentation, this makes your new string type incompatible with std::string. – Zan Lynx Sep 26 '12 at 21:25
  • 11
    Of course it does - for its own good. A case-insensitive string is something else: `typedef std::basic_string > istring`, not `typedef std::basic_string > string`. – Andreas Spindler Oct 09 '12 at 09:24
  • 2
    I know this is copied directly from GotW29, and I'd assume something this widely quoted was correct, but for me (on Visual Studio 2005) the find function here doesn't work. It causes the basic_string::find to overrun the buffer and crash. I had to change "return s;" to "return (n >= 0 ? s : NULL);". – njplumridge Mar 28 '13 at 11:47
  • 292
    "All you need to do..." – Tim MB Apr 19 '13 at 10:03
  • 2
    The compare() method calls toupper() twice for each character. Probably should buffer the result of toupper() to reduce CPU impact. – Nathan Feb 12 '14 at 23:30
  • 3
    @Nathan probably use a compiler that is able to perform basic CSE on the code... – The Paramagnetic Croissant Oct 12 '14 at 07:41
  • 34
    Any language construct that forces such insanity in this trivial case should and can be abandoned without regrets. – Erik Aronesty Nov 14 '14 at 14:17
  • 3
    @ErikAronesty and you would recommend...? – Big McLargeHuge Nov 15 '15 at 03:34
  • 5
    @DaveKennedy I think Erik advises abandoning human languages, as *those* are the language constructs that are forcing this insanity. :-) – srm Mar 21 '18 at 16:35
  • I'd like to point out that the second parameter of `find` should be `std::size_t`, not `int`. Unfortunately I can't edit because the question is locked. Also it's possible to implement `find` and `compare` in terms of `eq`, `lt` and `ne`. – Pharap Jul 16 '18 at 14:31
  • 1
    Though this does make the case-insensitive string incompatible with `std::string`, it's trivial to convert between the two using the range constructor. – celticminstrel Oct 11 '19 at 18:37
  • Additional followup: at a glance, this method appears to be incompatible with `std::unordered_map`. (Or at least, the implementation of the string hash in MSVC's standard library does not appear to use the char traits for anything.) So if using this with `std::unordered_map`, a specialization of `std::hash` will probably be needed too. – celticminstrel Mar 19 '20 at 19:11
  • `std::toupper` should [not](https://en.cppreference.com/w/cpp/string/byte/toupper#Notes) be called on `char` directly, a `static_cast` to `unsigned char` is needed. – Evg Sep 26 '20 at 09:49
  • With C++17 we can use a string_view to use the ci_char_traits – dodjango Oct 02 '20 at 20:58
  • `std::string s1{ "Ignore my CASE" }; std::string s2{ "ignore my case" }; std::basic_string_view ci_view{ s1.c_str() }; std::cout << std::boolalpha << "\"" << s1 << "\" equals \"" << s2 << "\": " << (s1.compare(s2) == 0) << std::endl; std::cout << "\"" << s1 << "\" equals \"" << s2 << "\" (ignore casing): " << (ci_view.compare(s2.c_str()) == 0) << std::endl;` – dodjango Oct 02 '20 at 21:06
63

If you are on a POSIX system, you can use strcasecmp. This function is not part of standard C, though, nor is it available on Windows. This will perform a case-insensitive comparison on 8-bit chars, so long as the locale is POSIX. If the locale is not POSIX, the results are undefined (so it might do a localized compare, or it might not). A wide-character equivalent is not available.

Failing that, a large number of historic C library implementations have the functions stricmp() and strnicmp(). Visual C++ on Windows renamed all of these by prefixing them with an underscore because they aren’t part of the ANSI standard, so on that system they’re called _stricmp or _strnicmp. Some libraries may also have wide-character or multibyte equivalent functions (typically named e.g. wcsicmp, mbcsicmp and so on).

C and C++ are both largely ignorant of internationalization issues, so there's no good solution to this problem, except to use a third-party library. Check out IBM ICU (International Components for Unicode) if you need a robust library for C/C++. ICU is for both Windows and Unix systems.

al45tair
  • 4,405
  • 23
  • 30
Derek Park
  • 45,824
  • 15
  • 58
  • 76
57

Are you talking about a dumb case insensitive compare or a full normalized Unicode compare?

A dumb compare will not find strings that might be the same but are not binary equal.

Example:

U212B (ANGSTROM SIGN)
U0041 (LATIN CAPITAL LETTER A) + U030A (COMBINING RING ABOVE)
U00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE).

Are all equivalent but they also have different binary representations.

That said, Unicode Normalization should be a mandatory read especially if you plan on supporting Hangul, Thaï and other asian languages.

Also, IBM pretty much patented most optimized Unicode algorithms and made them publicly available. They also maintain an implementation : IBM ICU

Community
  • 1
  • 1
Coincoin
  • 27,880
  • 7
  • 55
  • 76
  • Late commen, I know... *'Are all equivalent'* might not be fully correct, though I'm not familiar with the given case – German 'Umlaut's, though, could be created by combining `a`, `o` or `u` with diaeresis or directly via letters `ä`, `ö`, `ü` – *however* the distance of the two dots is (slightly) different (direct charachters narrower)... – Aconcagua Jul 06 '22 at 16:08
34

boost::iequals is not utf-8 compatible in the case of string. You can use boost::locale.

comparator<char,collator_base::secondary> cmpr;
cout << (cmpr(str1, str2) ? "str1 < str2" : "str1 >= str2") << endl;
  • Primary -- ignore accents and character case, comparing base letters only. For example "facade" and "Façade" are the same.
  • Secondary -- ignore character case but consider accents. "facade" and "façade" are different but "Façade" and "façade" are the same.
  • Tertiary -- consider both case and accents: "Façade" and "façade" are different. Ignore punctuation.
  • Quaternary -- consider all case, accents, and punctuation. The words must be identical in terms of Unicode representation.
  • Identical -- as quaternary, but compare code points as well.
Igor Milyakov
  • 602
  • 5
  • 7
34

My first thought for a non-unicode version was to do something like this:

bool caseInsensitiveStringCompare(const string& str1, const string& str2) {
    if (str1.size() != str2.size()) {
        return false;
    }
    for (string::const_iterator c1 = str1.begin(), c2 = str2.begin(); c1 != str1.end(); ++c1, ++c2) {
        if (tolower(static_cast<unsigned char>(*c1)) != tolower(static_cast<unsigned char>(*c2))) {
            return false;
        }
    }
    return true;
}
Shadow2531
  • 11,980
  • 5
  • 35
  • 48
  • 3
    `std::tolower` should [not](https://en.cppreference.com/w/cpp/string/byte/tolower#Notes) be called on `char` directly, a `static_cast` to `unsigned char` is needed. – Evg Sep 26 '20 at 09:50
  • 1
    @Evg, so ```if (tolower(static_cast(*c1)) != tolower(static_cast(*c2))``` will do? – Shadow2531 Sep 27 '20 at 10:51
  • 1
    Yes, this should be the correct way. – Evg Sep 27 '20 at 11:54
28

You can use strcasecmp on Unix, or stricmp on Windows.

One thing that hasn't been mentioned so far is that if you are using stl strings with these methods, it's useful to first compare the length of the two strings, since this information is already available to you in the string class. This could prevent doing the costly string comparison if the two strings you are comparing aren't even the same length in the first place.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
bradtgmurray
  • 13,683
  • 10
  • 38
  • 36
  • Since determining the length of a string consists of iterating over every character in the string and comparing it against 0, is there really that much difference between that and just comparing the strings right away? I guess you get better memory locality in the case where both strings don't match, but probably nearly 2x runtime in case of a match. – uliwitness Jan 22 '14 at 13:28
  • 6
    C++11 specifies that the complexity of std::string::length must be constant: http://www.cplusplus.com/reference/string/string/length/ – bradtgmurray Feb 04 '14 at 21:37
  • 2
    That's a fun little fact, but has little bearing here. strcasecmp() and stricmp() both take undecorated C strings, so there is no std::string involved. – uliwitness Feb 05 '14 at 17:39
  • 4
    These methods will return -1 if you compare "a" vs "ab". The lengths are different but "a" comes before "ab". So, simply comparing the lengths is not feasible if the caller cares about ordering. – Nathan Feb 12 '14 at 23:33
16

I'm trying to cobble together a good answer from all the posts, so help me edit this:

Here is a method of doing this, although it does transforming the strings, and is not Unicode friendly, it should be portable which is a plus:

bool caseInsensitiveStringCompare( const std::string& str1, const std::string& str2 ) {
    std::string str1Cpy( str1 );
    std::string str2Cpy( str2 );
    std::transform( str1Cpy.begin(), str1Cpy.end(), str1Cpy.begin(), ::tolower );
    std::transform( str2Cpy.begin(), str2Cpy.end(), str2Cpy.begin(), ::tolower );
    return ( str1Cpy == str2Cpy );
}

From what I have read this is more portable than stricmp() because stricmp() is not in fact part of the std library, but only implemented by most compiler vendors.

To get a truly Unicode friendly implementation it appears you must go outside the std library. One good 3rd party library is the IBM ICU (International Components for Unicode)

Also boost::iequals provides a fairly good utility for doing this sort of comparison.

Adam
  • 25,966
  • 23
  • 76
  • 87
  • can you please tell, what does mean ::tolower, why you can use tolower instead of tolower(), and what is '::' before? thanks – VextoR Mar 11 '11 at 08:40
  • 20
    This is not a very efficient solution - you make copies of both strings and transform all of them even if the first character is different. – Timmmm Mar 13 '11 at 18:14
  • 3
    If you're going to make a copy anyway, why not pass by value instead of by reference? – celticminstrel Jun 21 '15 at 02:17
  • 1
    the question asks explicitly to not `transform` the whole string before comparison – Sandburg Jun 06 '19 at 15:43
  • 1
    `std::tolower` should [not](https://en.cppreference.com/w/cpp/string/byte/tolower#Notes) be called on `char` directly, a `static_cast` to `unsigned char` is needed. – Evg Sep 26 '20 at 09:53
16

See std::lexicographical_compare:

// lexicographical_compare example
#include <iostream>     // std::cout, std::boolalpha
#include <algorithm>    // std::lexicographical_compare
#include <cctype>       // std::tolower

// a case-insensitive comparison function:
bool mycomp(char c1, char c2) {
    return std::tolower(c1) < std::tolower(c2);
}

int main() {
    std::string foo = "Apple";
    std::string bar = "apartment";
    
    std::cout << std::boolalpha;
    
    std::cout << "Comparing foo and bar lexicographically (foo<bar):\n";
    
    std::cout << "Using default comparison (operator<): ";
    std::cout << std::lexicographical_compare(foo.begin(), foo.end(), bar.begin(), bar.end());
    std::cout << '\n';
    
    std::cout << "Using custom comparison (mycomp): ";
    std::cout << std::lexicographical_compare(foo.begin(), foo.end(), bar.begin(), bar.end(), mycomp);
    std::cout << '\n';
    
    return 0;
}

Demo

Brian Rodriguez
  • 4,250
  • 1
  • 16
  • 37
  • 1
    This method is potentially unsafe and non-portable. `std::tolower` works only if the character is ASCII-encoded. There is no such guarantee for `std::string` - so it can be undefined behavior easily. – plasmacel Mar 27 '18 at 14:27
  • 1
    @plasmacel Then use a function that works w/ other encodings. – Brian Rodriguez Apr 06 '18 at 15:05
  • std::lexicographical_compare looked so promising, until we got to mycomp. :-( – Robin Davies Mar 12 '23 at 07:52
  • This algorithm works on any element type. If you want unicode support, then provide unicode-aware strings with unicode-aware iterators/comparisons. In other words, the issue lies with `std::string` not with `std::lexicographical_compare`. – Brian Rodriguez Jun 06 '23 at 17:28
16
str1.size() == str2.size() && std::equal(str1.begin(), str1.end(), str2.begin(), [](auto a, auto b){return std::tolower(a)==std::tolower(b);})

You can use the above code in C++14 if you are not in a position to use boost. You have to use std::towlower for wide chars.

bcmpinc
  • 3,202
  • 29
  • 36
vine'th
  • 4,890
  • 2
  • 27
  • 27
  • 4
    I think you need to add a `str1.size() == str2.size() &&` to the front so that will not go out of bounds when str2 is a prefix of str1. – ɲeuroburɳ Aug 01 '17 at 14:06
14

Short and nice. No other dependencies, than extended std C lib.

strcasecmp(str1.c_str(), str2.c_str()) == 0

returns true if str1 and str2 are equal. strcasecmp may not exist, there could be analogs stricmp, strcmpi, etc.

Example code:

#include <iostream>
#include <string>
#include <string.h> //For strcasecmp(). Also could be found in <mem.h>

using namespace std;

/// Simple wrapper
inline bool str_ignoreCase_cmp(std::string const& s1, std::string const& s2) {
    if(s1.length() != s2.length())
        return false;  // optimization since std::string holds length in variable.
    return strcasecmp(s1.c_str(), s2.c_str()) == 0;
}

/// Function object - comparator
struct StringCaseInsensetiveCompare {
    bool operator()(std::string const& s1, std::string const& s2) {
        if(s1.length() != s2.length())
            return false;  // optimization since std::string holds length in variable.
        return strcasecmp(s1.c_str(), s2.c_str()) == 0;
    }
    bool operator()(const char *s1, const char * s2){ 
        return strcasecmp(s1,s2)==0;
    }
};


/// Convert bool to string
inline char const* bool2str(bool b){ return b?"true":"false"; }

int main()
{
    cout<< bool2str(strcasecmp("asd","AsD")==0) <<endl;
    cout<< bool2str(strcasecmp(string{"aasd"}.c_str(),string{"AasD"}.c_str())==0) <<endl;
    StringCaseInsensetiveCompare cmp;
    cout<< bool2str(cmp("A","a")) <<endl;
    cout<< bool2str(cmp(string{"Aaaa"},string{"aaaA"})) <<endl;
    cout<< bool2str(str_ignoreCase_cmp(string{"Aaaa"},string{"aaaA"})) <<endl;
    return 0;
}

Output:

true
true
true
true
true
kyb
  • 7,233
  • 5
  • 52
  • 105
  • 11
    it is strange that C++ std::string has no ignore-case comparison method.. – kyb Sep 30 '16 at 15:52
  • 1
    "strcasecmp is not part of the standard" - Mark Ransom Dec 1 '14 at 19:57 – Liviu Oct 21 '16 at 14:21
  • 1
    yes, but the most of modern compilers have it or its another-named analog. `stricmp`, `strcmpi`, `strcasecmp`, etc. Thank you. message edited. – kyb Oct 21 '16 at 19:01
  • TODO: use `cout << boolalpha` rather than my `bool2str` because It to implicitly convert bool to chars for stream. – kyb Jun 01 '17 at 12:15
  • It's in in gcc's libraries. – Owl Aug 07 '17 at 21:09
13

Visual C++ string functions supporting unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx

the one you are probably looking for is _wcsnicmp

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Darren Kopp
  • 76,581
  • 9
  • 79
  • 93
  • 7
    Ironically, Microsoft's "wide character codes" are NOT unicode clean because they do not handle unicode normalization. – vy32 Jun 18 '11 at 23:36
12

FYI, strcmp() and stricmp() are vulnerable to buffer overflow, since they just process until they hit a null terminator. It's safer to use _strncmp() and _strnicmp().

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Wedge
  • 19,513
  • 7
  • 48
  • 71
  • 6
    True, although overREADing a buffer is significantly less dangerous than overWRITEing a buffer. – Adam Rosenfield Nov 17 '08 at 20:47
  • 4
    `stricmp()` and `strnicmp()` are not part of the POSIX standard :-( However you can find `strcasecmp()`, `strcasecmp_l()`, `strncasecmp()` and `strncasecmp_l()` in POSIX header `strings.h` :-) see [opengroup.org](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/strings.h.html#tag_13_52) – oHo Apr 11 '13 at 12:27
  • 4
    @AdamRosenfield 'worse' depends on context. In security, sometimes the whole point of an overwrite is to get to overread. – karmakaze Mar 21 '15 at 16:22
11

The Boost.String library has a lot of algorithms for doing case-insenstive comparisons and so on.

You could implement your own, but why bother when it's already been done?

Dean Harding
  • 71,468
  • 13
  • 145
  • 180
9

For my basic case insensitive string comparison needs I prefer not to have to use an external library, nor do I want a separate string class with case insensitive traits that is incompatible with all my other strings.

So what I've come up with is this:

bool icasecmp(const string& l, const string& r)
{
    return l.size() == r.size()
        && equal(l.cbegin(), l.cend(), r.cbegin(),
            [](string::value_type l1, string::value_type r1)
                { return toupper(l1) == toupper(r1); });
}

bool icasecmp(const wstring& l, const wstring& r)
{
    return l.size() == r.size()
        && equal(l.cbegin(), l.cend(), r.cbegin(),
            [](wstring::value_type l1, wstring::value_type r1)
                { return towupper(l1) == towupper(r1); });
}

A simple function with one overload for char and another for whar_t. Doesn't use anything non-standard so should be fine on any platform.

The equality comparison won't consider issues like variable length encoding and Unicode normalization, but basic_string has no support for that that I'm aware of anyway and it isn't normally an issue.

In cases where more sophisticated lexicographical manipulation of text is required, then you simply have to use a third party library like Boost, which is to be expected.

Neutrino
  • 8,496
  • 4
  • 57
  • 83
  • 2
    You could probably make that one function if you made it a template and used basic_string instead of separate string/wstring versions? – uliwitness Jan 22 '14 at 13:31
  • 2
    How would the single function template invoke either toupper or towupper without resorting to use of specialization or macros, a function overload seems like a simpler and more appropriate implementation than either. – Neutrino Jun 28 '15 at 15:39
9

Doing this without using Boost can be done by getting the C string pointer with c_str() and using strcasecmp:

std::string str1 ="aBcD";
std::string str2 = "AbCd";;
if (strcasecmp(str1.c_str(), str2.c_str()) == 0)
{
    //case insensitive equal 
}
DavidS
  • 2,160
  • 1
  • 19
  • 22
6

Assuming you are looking for a method and not a magic function that already exists, there is frankly no better way. We could all write code snippets with clever tricks for limited character sets, but at the end of the day at somepoint you have to convert the characters.

The best approach for this conversion is to do so prior to the comparison. This allows you a good deal of flexibility when it comes to encoding schemes, which your actual comparison operator should be ignorant of.

You can of course 'hide' this conversion behind your own string function or class, but you still need to convert the strings prior to comparison.

Andrew Grant
  • 58,260
  • 22
  • 130
  • 143
6

I wrote a case-insensitive version of char_traits for use with std::basic_string in order to generate a std::string that is not case-sensitive when doing comparisons, searches, etc using the built-in std::basic_string member functions.

So in other words, I wanted to do something like this.

std::string a = "Hello, World!";
std::string b = "hello, world!";

assert( a == b );

...which std::string can't handle. Here's the usage of my new char_traits:

std::istring a = "Hello, World!";
std::istring b = "hello, world!";

assert( a == b );

...and here's the implementation:

/*  ---

        Case-Insensitive char_traits for std::string's

        Use:

            To declare a std::string which preserves case but ignores case in comparisons & search,
            use the following syntax:

                std::basic_string<char, char_traits_nocase<char> > noCaseString;

            A typedef is declared below which simplifies this use for chars:

                typedef std::basic_string<char, char_traits_nocase<char> > istring;

    --- */

    template<class C>
    struct char_traits_nocase : public std::char_traits<C>
    {
        static bool eq( const C& c1, const C& c2 )
        { 
            return ::toupper(c1) == ::toupper(c2); 
        }

        static bool lt( const C& c1, const C& c2 )
        { 
            return ::toupper(c1) < ::toupper(c2);
        }

        static int compare( const C* s1, const C* s2, size_t N )
        {
            return _strnicmp(s1, s2, N);
        }

        static const char* find( const C* s, size_t N, const C& a )
        {
            for( size_t i=0 ; i<N ; ++i )
            {
                if( ::toupper(s[i]) == ::toupper(a) ) 
                    return s+i ;
            }
            return 0 ;
        }

        static bool eq_int_type( const int_type& c1, const int_type& c2 )
        { 
            return ::toupper(c1) == ::toupper(c2) ; 
        }       
    };

    template<>
    struct char_traits_nocase<wchar_t> : public std::char_traits<wchar_t>
    {
        static bool eq( const wchar_t& c1, const wchar_t& c2 )
        { 
            return ::towupper(c1) == ::towupper(c2); 
        }

        static bool lt( const wchar_t& c1, const wchar_t& c2 )
        { 
            return ::towupper(c1) < ::towupper(c2);
        }

        static int compare( const wchar_t* s1, const wchar_t* s2, size_t N )
        {
            return _wcsnicmp(s1, s2, N);
        }

        static const wchar_t* find( const wchar_t* s, size_t N, const wchar_t& a )
        {
            for( size_t i=0 ; i<N ; ++i )
            {
                if( ::towupper(s[i]) == ::towupper(a) ) 
                    return s+i ;
            }
            return 0 ;
        }

        static bool eq_int_type( const int_type& c1, const int_type& c2 )
        { 
            return ::towupper(c1) == ::towupper(c2) ; 
        }       
    };

    typedef std::basic_string<char, char_traits_nocase<char> > istring;
    typedef std::basic_string<wchar_t, char_traits_nocase<wchar_t> > iwstring;
John Dibling
  • 99,718
  • 31
  • 186
  • 324
  • 2
    This works for regular chars, but won't work for all of Unicode, as captitalization is not necessarily bidirectional (there's a good example in Greek involving sigma that I can't remember right now; something like it has two lower and one upper case, and you can't get a proper comparison either way) – coppro Nov 24 '08 at 21:02
  • 1
    That's really the wrong way to go about it. Case sensitivity should not be a property of the strings themselves. What happens when the same string object needs both case-sensitive and case insensitive comparisons? – Ferruccio Nov 24 '08 at 21:07
  • If case-sensitivity isn't appropriate to be "part of" the string, then neither is the find() function at all. Which, for you, might be true, and that's fine. IMO the greatest thing about C++ is that it doesn't force a particular paradigm on the programmer. It is what you want/need it to be. – John Dibling Nov 25 '08 at 06:20
  • Actually, I think most C++-guru's (like the ones on the standards committee) agree that it was a mistake to put find() in std::basic_string<> along with a whole lot of other things that could equally well be placed in free functions. Besides there are some issues with putting it in the type. – Andreas Magnusson Nov 25 '08 at 07:50
  • As others have pointed out, there are two major things wrong with this solution (ironically, one is the interface and the other is the implementation ;-)). – Konrad Rudolph Nov 25 '08 at 08:15
  • … but since Herb Sutter has made the same mistake and I've apparently even linked his article (I don't remember this!), I can't very well complain. – Konrad Rudolph Nov 25 '08 at 08:18
6

Late to the party, but here is a variant that uses std::locale, and thus correctly handles Turkish:

auto tolower = std::bind1st(
    std::mem_fun(
        &std::ctype<char>::tolower),
    &std::use_facet<std::ctype<char> >(
        std::locale()));

gives you a functor that uses the active locale to convert characters to lowercase, which you can then use via std::transform to generate lower-case strings:

std::string left = "fOo";
transform(left.begin(), left.end(), left.begin(), tolower);

This also works for wchar_t based strings.

Simon Richter
  • 28,572
  • 1
  • 42
  • 64
4

I've had good experience using the International Components for Unicode libraries - they're extremely powerful, and provide methods for conversion, locale support, date and time rendering, case mapping (which you don't seem to want), and collation, which includes case- and accent-insensitive comparison (and more). I've only used the C++ version of the libraries, but they appear to have a Java version as well.

Methods exist to perform normalized compares as referred to by @Coincoin, and can even account for locale - for example (and this a sorting example, not strictly equality), traditionally in Spanish (in Spain), the letter combination "ll" sorts between "l" and "m", so "lz" < "ll" < "ma".

Blair Conrad
  • 233,004
  • 25
  • 132
  • 111
4

Just use strcmp() for case sensitive and strcmpi() or stricmp() for case insensitive comparison. Which are both in the header file <string.h>

format:

int strcmp(const char*,const char*);    //for case sensitive
int strcmpi(const char*,const char*);   //for case insensitive

Usage:

string a="apple",b="ApPlE",c="ball";
if(strcmpi(a.c_str(),b.c_str())==0)      //(if it is a match it will return 0)
    cout<<a<<" and "<<b<<" are the same"<<"\n";
if(strcmpi(a.c_str(),b.c_str()<0)
    cout<<a[0]<<" comes before ball "<<b[0]<<", so "<<a<<" comes before "<<b;

Output

apple and ApPlE are the same

a comes before b, so apple comes before ball

reubenjohn
  • 1,351
  • 1
  • 18
  • 43
3

A simple way to compare two string in c++ (tested for windows) is using _stricmp

// Case insensitive (could use equivalent _stricmp)  
result = _stricmp( string1, string2 );  

If you are looking to use with std::string, an example:

std::string s1 = string("Hello");
if ( _stricmp(s1.c_str(), "HELLO") == 0)
   std::cout << "The string are equals.";

For more information here: https://msdn.microsoft.com/it-it/library/e0z9k731.aspx

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
DAme
  • 697
  • 8
  • 21
  • 1
    It's worth reading https://stackoverflow.com/a/12414441/95309 in addition to this answer, as it's a) a C function, and b) supposedly not portable. – Claus Jørgensen Aug 23 '18 at 12:18
  • what #include do we need to make this work? – ekkis Jun 22 '19 at 21:37
  • 1
    @ekkis to use _stricmp you have to include as you can read here: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/stricmp-wcsicmp-mbsicmp-stricmp-l-wcsicmp-l-mbsicmp-l?view=vs-2019 – DAme Jul 01 '19 at 07:47
  • 1
    Nice try microsoft! – AdrianTut Mar 25 '22 at 10:38
2

As of early 2013, the ICU project, maintained by IBM, is a pretty good answer to this.

http://site.icu-project.org/

ICU is a "complete, portable Unicode library that closely tracks industry standards." For the specific problem of string comparison, the Collation object does what you want.

The Mozilla Project adopted ICU for internationalization in Firefox in mid-2012; you can track the engineering discussion, including issues of build systems and data file size, here:

michaelhanson
  • 351
  • 2
  • 5
2

Just a note on whatever method you finally choose, if that method happens to include the use of strcmp that some answers suggest:

strcmp doesn't work with Unicode data in general. In general, it doesn't even work with byte-based Unicode encodings, such as utf-8, since strcmp only makes byte-per-byte comparisons and Unicode code points encoded in utf-8 can take more than 1 byte. The only specific Unicode case strcmp properly handle is when a string encoded with a byte-based encoding contains only code points below U+00FF - then the byte-per-byte comparison is enough.

Johann Gerell
  • 24,991
  • 10
  • 72
  • 122
2

Looks like above solutions aren't using compare method and implementing total again so here is my solution and hope it works for you (It's working fine).

#include<iostream>
#include<cstring>
#include<cmath>
using namespace std;
string tolow(string a)
{
    for(unsigned int i=0;i<a.length();i++)
    {
        a[i]=tolower(a[i]);
    }
    return a;
}
int main()
{
    string str1,str2;
    cin>>str1>>str2;
    int temp=tolow(str1).compare(tolow(str2));
    if(temp>0)
        cout<<1;
    else if(temp==0)
        cout<<0;
    else
        cout<<-1;
}
1

If you have to compare a source string more often with other strings one elegant solution is to use regex.

std::wstring first = L"Test";
std::wstring second = L"TEST";

std::wregex pattern(first, std::wregex::icase);
bool isEqual = std::regex_match(second, pattern);
smibe
  • 159
  • 10
  • Tried this but compile error: `error: conversion from 'const char [5]' to non-scalar type 'std::wstring {aka std::basic_string}' requested` – Deqing May 15 '15 at 05:18
  • bad idea. It is the worst solution. – Behrouz.M Jun 01 '15 at 13:05
  • This isn't a good solution, but even if you wanted to use it, you need an L in front of your widestring constants, eg L"TEST" – celticminstrel Jun 21 '15 at 02:27
  • Would be nice if someone could explain why it is the worst solution. Because of performance issues? Creating the regex is expensive, but afterwards the comparison should be really fast. – smibe Sep 30 '15 at 12:49
  • it's usable and portable, the major problem is that first can't contain any characters that regex uses. It can't be used as a general string compare because of that. It will also be slower, there is a flag to make it work the way smibe says but still can't be used as a general function. – Ben Aug 16 '16 at 21:37
1

If you don't want to use Boost library then here is solution to it using only C++ standard io header.

#include <iostream>

struct iequal
{
    bool operator()(int c1, int c2) const
    {
        // case insensitive comparison of two characters.
        return std::toupper(c1) == std::toupper(c2);
    }
};

bool iequals(const std::string& str1, const std::string& str2)
{
    // use std::equal() to compare range of characters using the functor above.
    return std::equal(str1.begin(), str1.end(), str2.begin(), iequal());
}

int main(void)
{
    std::string str_1 = "HELLO";
    std::string str_2 = "hello";

    if(iequals(str_1,str_2))
    {
        std::cout<<"String are equal"<<std::endl;   
    }

    else
    {
        std::cout<<"String are not equal"<<std::endl;
    }


    return 0;
}
Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
Haseeb Mir
  • 928
  • 1
  • 13
  • 22
  • I believe std::toupper is in #include , you might need to include it. – David Ledger Dec 02 '18 at 12:00
  • If you will use global version like this ::toupper then you might not need to include because there are two versions c version and c++ version with locale i guess. So better to use global version "::toupper()" – Haseeb Mir Dec 02 '18 at 13:52
  • this solution fails when one of the strings is empty: "" -- it returns true in that case when it should return false – ekkis Jun 22 '19 at 21:35
-1
bool insensitive_c_compare(char A, char B){
  static char mid_c = ('Z' + 'a') / 2 + 'Z';
  static char up2lo = 'A' - 'a'; /// the offset between upper and lowers

  if ('a' >= A and A >= 'z' or 'A' >= A and 'Z' >= A)
      if ('a' >= B and B >= 'z' or 'A' >= B and 'Z' >= B)
      /// check that the character is infact a letter
      /// (trying to turn a 3 into an E would not be pretty!)
      {
        if (A > mid_c and B > mid_c or A < mid_c and B < mid_c)
        {
          return A == B;
        }
        else
        {
          if (A > mid_c)
            A = A - 'a' + 'A'; 
          if (B > mid_c)/// convert all uppercase letters to a lowercase ones
            B = B - 'a' + 'A';
          /// this could be changed to B = B + up2lo;
          return A == B;
        }
      }
}

this could probably be made much more efficient, but here is a bulky version with all its bits bare.

not all that portable, but works well with whatever is on my computer (no idea, I am of pictures not words)

user4578093
  • 231
  • 1
  • 3
  • 10
-4

An easy way to compare strings that are only different by lowercase and capitalized characters is to do an ascii comparison. All capital and lowercase letters differ by 32 bits in the ascii table, using this information we have the following...

    for( int i = 0; i < string2.length(); i++)
    {
       if (string1[i] == string2[i] || int(string1[i]) == int(string2[j])+32 ||int(string1[i]) == int(string2[i])-32) 
    {
      count++;
      continue;
    }
    else 
    {
      break;
    }
    if(count == string2.length())
    {
      //then we have a match
    }
}
HaveNoDisplayName
  • 8,291
  • 106
  • 37
  • 47
  • 3
    According to this, "++j" will be found equal to "KKJ", and "1234" will be found equal to "QRST". I doubt that's something anyone wants. – celticminstrel Jun 21 '15 at 02:24