Case Insensitive String Comparison in C

Question

I have two postcodes char* that I want to compare, ignoring case. Is there a function to do this?

Or do I have to loop through each use the tolower function and then do the comparison?

Any idea how this function will react with numbers in the string

Thanks

I think I wrote that in a bad way, postcode is not a type , just the real world value the char* will hold. — bond425, Apr 28 '11 at 15:11
What platform are you on? Many platforms have a platform-specific function to do this. — Random832, Apr 28 '11 at 15:11
If you are comparing a number with a letter, then you know the strings aren't equivalent, regardless of case. — Alex Reynolds, Apr 28 '11 at 15:11
I assume you just mean ASCII string comparison? Not generic to the whole world across multiple locales? — Doug T., Apr 28 '11 at 15:11
The comparison could result in comparing a number and a letter, I need to test if two postcodes are equal to each other, one is greater than or one is less than. The greater than, less than part is confusing, I'm not sure how that's going to work out — bond425, Apr 28 '11 at 16:49

score 69 · Answer 1 · edited Aug 23 '18 at 18:29

69

There is no function that does this in the C standard. Unix systems that comply with POSIX are required to have strcasecmp in the header strings.h; Microsoft systems have stricmp. To be on the portable side, write your own:

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

But note that none of these solutions will work with UTF-8 strings, only ASCII ones.

edited Aug 23 '18 at 18:29

chux - Reinstate Monica

143,097
13
135
256

answered Apr 28 '11 at 15:21

Fred Foo

355,277
75
744
836

6

This implementation is not correct; it will incorrectly return 0 when b is a substring of a. For example it will return 0 for strcicmp("another", "an") but it should return 1 – RobertoP May 07 '12 at 19:05
This also seems very inefficient. – Jonathan Wood Oct 21 '14 at 19:59
40

This is bad advice. There is no reason to "write your own" standard C text functions to deal with a simple name difference. Do #ifdef _WINDOWS ... #define strcasecmp stricmp ... #endif and put it in an appropriate header. The above comments where the author had to fix the function to work right is why rewriting standard C functions is counter-productive if a far simpler solution is available. – B. Nadolson Feb 12 '15 at 11:49
4

Neither _stricmp nor strcasecmp is available in -std=c++11. They also have different semantics with regards to locale. – minexew May 01 '15 at 16:15
This implementation is correct. stricmp("another", "an") will return ('o' - '\0') or (111 - 0) which equates to 111. – Tails86 Mar 29 '16 at 22:01
2

This will break awfully when `a` or `b` are `NULL`. – YoTengoUnLCD Nov 14 '17 at 15:39
7

@YoTengoUnLCD Re: [break awfully when a or b are NULL](https://stackoverflow.com/questions/5820810/case-insensitive-string-comp-in-c/51992138#comment81529857_5820991). Breaking with `a` and/or `b` as `NULL` is commonly accepted practice as a _null pointer_ does not point to a _string_. Not a bad check to add, yet what to return? Should `cmp("", NULL)` return 0, INT_MIN? There is not consensus on this. Note: C allows UB with `strcmp(NULL, "abc");`. – chux - Reinstate Monica Aug 23 '18 at 18:34
This implementation doesn't seem quite right. For a correct implementation that fixes all of the problems noted in other comments here, and for a full, runnable example see here: https://stackoverflow.com/a/55293507/4561887 – Gabriel Staples Mar 22 '19 at 17:03
@YoTengoUnLCD, @chux, [in my answer](https://stackoverflow.com/questions/5820810/case-insensitive-string-comp-in-c/55293507#55293507) I chose to return `INT_MIN` for the NULL ptr case. – Gabriel Staples May 21 '19 at 20:57

score 49 · Answer 2 · edited Feb 22 '21 at 19:45

49

Take a look at strcasecmp() in strings.h.

edited Feb 22 '21 at 19:45

Neuron

5,141
5
38
59

answered Apr 28 '11 at 15:11

Mihran Hovsepyan

10,810
14
61
111

5

I think you mean `int strcasecmp(const char *s1, const char *s2);` in strings.h – Brigham Apr 28 '11 at 15:15
Yes this is what I mean :) but maybe you have typo, not in stringS.h but in string.h – Mihran Hovsepyan Apr 28 '11 at 15:17
3

This function is non-standard; Microsoft calls it `stricmp`. @entropo: `strings.h` is a header for compatibility with 1980s Unix systems. – Fred Foo Apr 28 '11 at 15:19
@entropo Maybe there is also strings.h, but as far as I know string.h is standart one and there is `int strcasecmp(const char * s1, const char * s2)` function there. http://www.mkssoftware.com/docs/man3/strcasecmp.3.asp – Mihran Hovsepyan Apr 28 '11 at 15:19
1

@entropo: apologies, POSIX does seem to define `strings.h`. It also defined `strcasecmp`, to be declared in that header. ISO C doesn't have it, though. – Fred Foo Apr 28 '11 at 15:24
6

See: [difference-between-string-h-and-strings-h](http://stackoverflow.com/questions/4291149/difference-between-string-h-and-strings-h/4291328#4291328) . Some C standard libraries have merged all of the non-deprecated functions into `string.h`. See, e.g., [Glibc](http://www.gnu.org/software/libc/manual/html_mono/libc.html#String_002fArray-Comparison) – entropo Apr 28 '11 at 15:26
1

Yes it seems there is such header strings.h and in theory `strcasecmp` should be declared there. But all compilers I used have `strcasecmp` declared in string.h. at least cl, g++, forte c++ compilers has it. – Mihran Hovsepyan Apr 28 '11 at 15:27
1

@Mihran: this has nothing to do with the compiler. It's a library issue. – Fred Foo Apr 28 '11 at 15:29
1

Yes I understand this. Saying "compiler header file" I mean the file that automatically comes with compiler during its installation. – Mihran Hovsepyan Apr 28 '11 at 15:31

score 10 · Answer 3 · answered Jan 04 '16 at 11:16

I've found built-in such method named from which contains additional string functions to the standard header .

Here's the relevant signatures :

int  strcasecmp(const char *, const char *);
int  strncasecmp(const char *, const char *, size_t);

I also found it's synonym in xnu kernel (osfmk/device/subrs.c) and it's implemented in the following code, so you wouldn't expect to have any change of behavior in number compared to the original strcmp function.

tolower(unsigned char ch) {
    if (ch >= 'A' && ch <= 'Z')
        ch = 'a' + (ch - 'A');
    return ch;
 }

int strcasecmp(const char *s1, const char *s2) {
    const unsigned char *us1 = (const u_char *)s1,
                        *us2 = (const u_char *)s2;

    while (tolower(*us1) == tolower(*us2++))
        if (*us1++ == '\0')
            return (0);
    return (tolower(*us1) - tolower(*--us2));
}

`strcasecmp()` and `strncasecmp()` are not part of the standard C library, but common additions in *nix. — chux - Reinstate Monica, Aug 23 '18 at 18:38
Note that there's no reason to implement your own `tolower()` function if you're compiling with a standards-compliant compiler/C implementation - `tolower()` is a required function per effectively every version of the C standard. — Andrew Henle, Feb 22 '21 at 19:59

chux - Reinstate Monica · Answer 4 · 2021-01-28T19:42:23.797

Additional pitfalls to watch out for when doing case insensitive compares:

Comparing as lower or as upper case? (common enough issue)

Both below will return 0 with strcicmpL("A", "a") and strcicmpU("A", "a").
Yet strcicmpL("A", "_") and strcicmpU("A", "_") can return different signed results as '_' is often between the upper and lower case letters.

This affects the sort order when used with qsort(..., ..., ..., strcicmp). Non-standard library C functions like the commonly available stricmp() or strcasecmp() tend to be well defined and favor comparing via lowercase. Yet variations exist.

int strcicmpL(char const *a, char const *b) {
  while (*b) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return tolower(*a);
}

int strcicmpU(char const *a, char const *b) {
  while (*b) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return toupper(*a);
}

char can have a negative value. (not rare)

touppper(int) and tolower(int) are specified for unsigned char values and the negative EOF. Further, strcmp() returns results as if each char was converted to unsigned char, regardless if char is signed or unsigned.

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct (Almost - see following)

char can have a negative value and not 2's complement. (rare)

The above does not handle -0 nor other negative values properly as the bit pattern should be interpreted as unsigned char. To properly handle all integer encodings, change the pointer type first.

// tolower((unsigned char) *a);
tolower(*(const unsigned char *)a); // Correct

Locale (less common)

Although character sets using ASCII code (0-127) are ubiquitous, the remainder codes tend to have locale specific issues. So strcasecmp("\xE4", "a") might return a 0 on one system and non-zero on another.

Unicode (the way of the future)

If a solution needs to handle more than ASCII consider a unicode_strcicmp(). As C lib does not provide such a function, a pre-coded function from some alternate library is recommended. Writing your own unicode_strcicmp() is a daunting task.

Do all letters map one lower to one upper? (pedantic)

[A-Z] maps one-to-one with [a-z], yet various locales map various lower case chracters to one upper and visa-versa. Further, some uppercase characters may lack a lower case equivalent and again, visa-versa.

This obliges code to covert through both tolower() and tolower().

int d = tolower(toupper(*a)) - tolower(toupper(*b));

Again, potential different results for sorting if code did tolower(toupper(*a)) vs. toupper(tolower(*a)).

Portability

@B. Nadolson recommends to avoid rolling your own strcicmp() and this is reasonable, except when code needs high equivalent portable functionality.

Below is an approach that even performed faster than some system provided functions. It does a single compare per loop rather than two by using 2 different tables that differ with '\0'. Your results may vary.

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // compare using tables that differ slightly.
  while (low1[*(const unsigned char *)a] == low2[*(const unsigned char *)b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[*(const unsigned char *)a] - low1[*(const unsigned char *)b]);
}

score 6 · Answer 5 · answered Apr 28 '11 at 15:17

6

I would use stricmp(). It compares two strings without regard to case.

Note that, in some cases, converting the string to lower case can be faster.

answered Apr 28 '11 at 15:17

Jonathan Wood

65,341
71
269
466

Gabriel Staples · Answer 6 · 2022-09-29T20:24:44.837

`strncmpci()`, a direct, drop-in case-insensitive string comparison replacement for `strncmp()` and `strcmp()`

I'm not really a fan of the most-upvoted answer here (in part because it seems like it isn't correct since it should continue if it reads a null terminator in either string--but not both strings at once--and it doesn't do this), so I wrote my own.

This is a direct drop-in replacement for strncmp(), and has been tested with numerous test cases, as shown below.

It is identical to strncmp() except:

It is case-insensitive.
The behavior is NOT undefined (it is well-defined) if either string is a null ptr. Regular strncmp() has undefined behavior if either string is a null ptr (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
It returns INT_MIN as a special sentinel error value if either input string is a NULL ptr.

LIMITATIONS: Note that this code works on the original 7-bit ASCII character set only (decimal values 0 to 127, inclusive), NOT on unicode characters, such as unicode character encodings UTF-8 (the most popular), UTF-16, and UTF-32.

Here is the code only (no comments):

int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

Fully-commented version:

/// \brief      Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
///             if two C-strings are equal.
/// \note       1. Identical to `strncmp()` except:
///               1. It is case-insensitive.
///               2. The behavior is NOT undefined (it is well-defined) if either string is a null
///               ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
///               (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
///               3. It returns `INT_MIN` as a special sentinel value for certain errors.
///             - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
///               - Aided/inspired, in part, by `strcicmp()` here:
///                 https://stackoverflow.com/a/5820991/4561887.
/// \param[in]  str1        C string 1 to be compared.
/// \param[in]  str2        C string 2 to be compared.
/// \param[in]  num         max number of chars to compare
/// \return     A comparison code (identical to `strncmp()`, except with the addition
///             of `INT_MIN` as a special sentinel value):
///
///             INT_MIN (usually -2147483648 for int32_t integers)  Invalid arguments (one or both
///                      of the input strings is a NULL pointer).
///             <0       The first character that does not match has a lower value in str1 than
///                      in str2.
///              0       The contents of both strings are equal.
///             >0       The first character that does not match has a greater value in str1 than
///                      in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`, so
    // long as 1st: we have not yet compared the requested number of chars, and 2nd: the next char
    // of at least *one* of the strings is not zero (the null terminator for a C-string), meaning
    // that string still has more characters in it.
    // Note: you MUST check `(chars_compared < num)` FIRST or else dereferencing (reading) `str1` or
    // `str2` via `*str1` and `*str2`, respectively, is undefined behavior if you are reading one or
    // both of these C-strings outside of their array bounds.
    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

Test code:

Download the entire sample code, with unit tests, from my eRCaGuy_hello_world repository here: "strncmpci.c":

(this is just a snippet)

int main()
{
    printf("-----------------------\n"
           "String Comparison Tests\n"
           "-----------------------\n\n");

    int num_failures_expected = 0;

    printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
    EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
    num_failures_expected++;
    printf("------ beginning ------\n\n");


    const char * str1;
    const char * str2;
    size_t n;

    // NULL ptr checks
    EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);

    EXPECT_EQUALS(strncmpci("", "", 0), 0);
    EXPECT_EQUALS(strncmp("", "", 0), 0);

    str1 = "";
    str2 = "";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "hEYd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');

    str1 = "heY";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');

    str1 = "hey";
    str2 = "hey";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), -'d');

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hEY";
    str2 = "heyYOU";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEY";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEYHowAre";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to meet you.,;", 100), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');


    if (globals.error_count == num_failures_expected)
    {
        printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
    }
    else
    {
        printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
            ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
    }

    assert(globals.error_count == num_failures_expected);
    return globals.error_count;
}

Sample output:

$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
-----------------------
String Comparison Tests
-----------------------

INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!
FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H'
  a: strncmpci("hey", "HEY", 3) is 0
  b: 'h' - 'H' is 32

------ beginning ------

All unit tests passed!

References:

This question & other answers here served as inspiration and gave some insight (Case Insensitive String Comparison in C)
http://www.cplusplus.com/reference/cstring/strncmp/
https://en.wikipedia.org/wiki/ASCII
https://en.cppreference.com/w/c/language/operator_precedence
Undefined Behavior research I did to fix part of my code above (see comments below):
1. Google search for "c undefined behavior reading outside array bounds"
2. Is accessing a global array outside its bound undefined behavior?
3. https://en.cppreference.com/w/cpp/language/ub - see also the many really great "External links" at the bottom!
4. 1/3: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
5. 2/3: https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
6. 3/3: https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
7. https://blog.regehr.org/archives/213
8. https://www.geeksforgeeks.org/accessing-array-bounds-ccpp/

Topics to further research

(Note: this is C++, not C) Lowercase of Unicode character
tolower_tests.c on OnlineGDB: https://onlinegdb.com/HyZieXcew

TODO:

Make a version of this code which also works on Unicode's UTF-8 implementation (character encoding)!

`in part because it isn't correct since ...` you code isn't correct either. There is no point to use [tolower](https://en.cppreference.com/w/cpp/string/byte/tolower), it's going to be by far the slowest part of the function. If you really want your function to be locale aware and handle non-ascii chars then you have to cast your chars to unsigned first. Otherwise, your code results in UB — Pavel P, Jul 25 '20 at 08:41
@PavelP, I'm really not following what you're saying. Why is there no point in using `tolower()`, when that's how we get the case-insensitive effect, which is the point of this question? Also, you linked to the C++ reference for it instead of the C reference for it. Doesn't that changes things? I never said my function was locale-aware or that it could handle non-ASCII chars, but I really don't see how casting to `unsigned char` first solves anything. All chars can be cast to unsigned. I don't understand your comment. — Gabriel Staples, Jul 25 '20 at 21:18
I've updated my answer to specify it's for ASCII chars only. Also, if you write an answer to clarify what you mean that'd be helpful. Lastly, I have no non-ASCII locale (I mean no non-`"C"` locale, [which according to `setlocale()` is the default at program startup](http://www.cplusplus.com/reference/clocale/setlocale/)) or unicode experience in C or C++. I ask you to be thorough enough in any answer you may write to clarify these points and things. — Gabriel Staples, Jul 25 '20 at 21:20
For ascii only I'd never use std::tolower, better do it manually: `static int tolower(char c){ return (c >= 'A' && c <= 'Z') ? (c | ' ') : c; }`. std::tolower is very slow because it's locale aware. — Pavel P, Jul 26 '20 at 14:54
For non ascii [you have to cast chars to unsigned char](https://stackoverflow.com/a/21805970/468725) before calling `tolower`/`toupper` — Pavel P, Jul 26 '20 at 14:55
Voting down this solution - it advertizes to be a drop-in/tested solution, but a simple additional test using `""` shows that it will not behave like the linux/windows version of it, returning `strncmpci("", "", 0) = -9999` instead of `0` — GaspardP, Oct 21 '20 at 18:07
Hi @GaspardP, thanks for pointing out this edge case. I've fixed my code now. The fix was simple. I initialized `ret_code` to `0` instead of to `INT_MIN` (or `-9999` as it was in the code you tested), and then set it to `INT_MIN` only if one of the input strings is a `NULL` ptr. Now it works perfectly. The problem was simply that for `n` is 0, none of the blocks were entered (neither the `if` nor the `while`), so it simply returned what I had initialized `ret_code` to. Anyway, it's fixed now, & I've cleaned up my unit tests _a ton_ and added in the test you mentioned. Hopefully you upvote now. — Gabriel Staples, Oct 22 '20 at 08:17
`(*str1 || *str2) && (chars_compared < num)` appears to have the test in the wrong order. `chars_compared < num` should go first, else code is accessing data 1 too far. Result UB. — chux - Reinstate Monica, Jan 28 '21 at 19:21
@chux-ReinstateMonica, I agree if we use the `*str1` or `*str2` value after `chars_compared >= num`, we are reading outside the bounds of our char array and what we read is not known, _but is it UB (Undefined Behavior) to **read and then discard** a value like this?_ Because it is true I am reading 1 too far--outside the bounds on C-strings which are NOT null-terminated, but I'm not using those readings. — Gabriel Staples, Jan 28 '21 at 19:50
Forming the address 1-past the range is OK. Dereferencing that address, as code does here with `*str1`, is UB. Code here is "using" it in that it attempting to read through that pointer. The UB is _usually_ benign, yet remains UB. The whole point of a size parameter is to prevent access outside the array bounds - which this code violated. With `num == 0`, nothing should be read. — chux - Reinstate Monica, Jan 28 '21 at 19:55
@chux-ReinstateMonica, after doing some additional study, I agree with everything you said. Reading outside an array bounds is UB, even if the value is just checked to see if it is zero and then discarded. I'll fix it. — Gabriel Staples, Jan 28 '21 at 21:10
@chux-ReinstateMonica. Answer fixed. I'll fix my git repo this code is in later--maybe tonight. — Gabriel Staples, Jan 28 '21 at 21:26
Perhaps post on [code review](https://codereview.stackexchange.com). Might get additional good feedback. I see about 4+ points to consider. — chux - Reinstate Monica, Jan 28 '21 at 21:53
Posted. This is my first question on that site: https://codereview.stackexchange.com/questions/255344/case-insensitive-strncmp-for-ascii-chars-only-not-utf-8. — Gabriel Staples, Jan 28 '21 at 22:06

score 4 · Answer 7 · answered May 23 '19 at 15:37

4

As others have stated, there is no portable function that works on all systems. You can partially circumvent this with simple ifdef:

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}

answered May 23 '19 at 15:37

Miljen Mikic

14,765
8
58
66

this reminds me that `strings.h` (with an `s`), is not the same as `string.h`.... I've spent some time looking from `strcasecmp` on the wrong one.... – Gustavo Vargas Nov 08 '21 at 22:09
@GustavoVargas Me too, then I decided to write it here and save time for the future myself and others :) – Miljen Mikic Nov 09 '21 at 08:40

score 1 · Answer 8 · answered Dec 27 '15 at 03:36

You can get an idea, how to implement an efficient one, if you don't have any in the library, from here

It use a table for all 256 chars.

in that table for all chars, except letters - used its ascii codes.
for upper case letter codes - the table list codes of lower cased symbols.

then we just need to traverse a strings and compare our table cells for a given chars:

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '\0')
        return (0);
return (cm[*us1] - cm[*--us2]);

ericcurtin · Answer 9 · 2019-12-14T11:09:35.983

1

Simple solution:

int str_case_ins_cmp(const char* a, const char* b) {
  int rc;

  while (1) {
    rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
    if (rc || !*a) {
      break;
    }

    ++a;
    ++b;
  }

  return rc;
}

edited Dec 14 '19 at 11:09

answered Dec 14 '19 at 10:33

ericcurtin

1,499
17
20

score 0 · Answer 10 · answered Feb 14 '16 at 10:17

0

static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}

Reference

answered Feb 14 '16 at 10:17

smamran

741
2
14
20

2

The `OR`ing idea is kind of nifty, but the logic is flawed. For example, `ignoreCaseComp("\`", "@", 1)` and perhaps more importantly, `ignoreCaseComp("\0", " ", 1)` (i.e. where all bits other than bit 5 (decimal 32) are identical) both evaluates to `0` (match). – user966939 May 30 '19 at 17:59

The Oathman · Answer 11 · 2022-02-23T20:48:35.457

if we have a null terminated character:

   bool striseq(const char* s1,const char* s2){ 
     for(;*s1;){ 
       if(tolower(*s1++)!=tolower(*s2++)) 
         return false; 
      } 
      return *s1 == *s2;
    }

or with this version that uses bitwise operations:

    int striseq(const char* s1,const char* s2)
       {for(;*s1;) if((*s1++|32)!=(*s2++|32)) return 0; return *s1 == *s2;}

i'm not sure if this works with symbols, I haven't tested there, but works fine with letters.

jaldk · Answer 12 · 2016-01-22T11:40:37.220

-1

int strcmpInsensitive(char* a, char* b)
{
    return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}

char* lowerCaseWord(char* a)
{
    char *b=new char[strlen(a)];
    for (int i = 0; i < strlen(a); i++)
    {
        b[i] = tolower(a[i]);   
    }
    return b;
}

good luck

Edit-lowerCaseWord function get a char* variable with, and return the lower case value of this char*. For example "AbCdE" for value of char*, will return "abcde".

Basically what it does is to take the two char* variables, after being transferred to lower case, and make use the strcmp function on them.

For example- if we call the strcmpInsensitive function for values of "AbCdE", and "ABCDE", it will first return both values in lower case ("abcde"), and then do strcmp function on them.

edited Jan 22 '16 at 11:40

answered Jan 21 '16 at 21:51

jaldk

123
1
7

some explanation could go a long way – davejal Jan 21 '16 at 22:29
It seems wholly inefficient to lower both input strings, when the function "might" return as soon as after the first character compare instead. e.g. "ABcDe" vs "BcdEF", could return very quickly, without needing to lower or upper anything other than the first character of each string. – T.S Feb 07 '17 at 15:34
6

Not to mention leaking memory twice. – Ruud van Gaal Feb 11 '17 at 22:17
1

You don't null-terminate your lower case strings, so the subsequent `strcmp()` might crash the program. – sth Feb 20 '17 at 19:22
1

You also compute strlen(a) a total of strlen(a)+1 times. That together with the loop itself and you're traversing a strlen(a)+2 times. – Stefan Vorkoetter Jul 03 '18 at 13:53

Case Insensitive String Comparison in C

12 Answers12

Additional pitfalls to watch out for when doing case insensitive compares:

`strncmpci()`, a direct, drop-in case-insensitive string comparison replacement for `strncmp()` and `strcmp()`

Test code:

Sample output:

References:

Topics to further research

TODO:

Linked

Related

Case Insensitive String Comparison in C

12 Answers12

Additional pitfalls to watch out for when doing case insensitive compares:

strncmpci(), a direct, drop-in case-insensitive string comparison replacement for strncmp() and strcmp()

Test code:

Sample output:

References:

Topics to further research

TODO:

Linked

Related

`strncmpci()`, a direct, drop-in case-insensitive string comparison replacement for `strncmp()` and `strcmp()`