-3

Maybe somebody knows, what is exact C or C++ (either one will do) analog of C#'s string.Compare ignoring case? Turned out, that _wcsicmp differs, although both are supposed to use current locale or culture (which is en_US).

With string.Compare(..., true),
 or string.Compare(..., StringComparison.CurrentCultureIgnoreCase),
 or string.Compare(..., StringComparison.InvariantCultureIgnoreCase):
'~' before '+',
'=' before number,
letter before single quote

_wcsicmp or wcsicmp_l with explicit locale (LC_ALL, L"en_US") puts them in opposite order. Same exactly result from std::wcscoll.

I can reproduce it using character table, but maybe there is a better way. Thanks!

===== Probably nobody knows. I am posting the workaround, which is unnecessary with C#. It takes care of ANSI subset (0-256, which I mostly care about) and partially the rest of Unicode table:

int compareNoCase(const std::wstring& a, const std::wstring& b, int size = -1)
{
    return compareNoCase(a.c_str(), b.c_str(), size);
}

int compareNoCase(LPCWSTR a, LPCWSTR b, int size = -1)
{
    static const unsigned char table[] = {
        0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
        0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
        0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x63, 0x27, 0x28, 0x29, 0x3d, 0x2a, 0x64, 0x2b, 0x2c,
        0x3f, 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x2d, 0x2e, 0x2f, 0x3e, 0x30, 0x31,
        0x32, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
        0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 0x61, 0x62, 0x33, 0x34, 0x35, 0x36, 0x37,
        0x38, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
        0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 0x61, 0x62, 0x39, 0x3a, 0x3b, 0x3c, 0x65
    };

    for (int i = 0; size < 0 || i < size; i++) {
        wchar_t ca = a[i];
        wchar_t cb = b[i];
        if (ca == 0 || cb == 0) {           // if at least one of the strings is over:
            return (ca == 0) ? ((cb == 0) ? 0 : -1) : 1;
        }
        if (ca != cb) {                     // if next characters are different, go in
            if (ca < 0x7f && cb < 0x7f) {   // if both characters are ASCII, use table
                if (table[ca] != table[cb]) {
                    return (table[ca] > table[cb]) ? 1 : -1;
                }
            }
            else {                          // otherwise use default system locale
                int ret = std::wcscoll(a + i, b + i);
                if (ret != 0) {
                    return ret;
                }
            }
        }
    }
    return 0;
}

The table does not contain characters, forbidden in file names. Explorer-style "numeric" compare is not related to this question. I also removed handling of multiple locales for clarity.

Please let me know if anybody has better idea!

  • 4
    There is no such language as C/C++. They are two different languages and the answer will be different for the two. Please pick only one. – kaylum Jan 10 '22 at 05:52
  • @kaylum - ok, C OR C++. I am aware that they are different languages, but I would be happy to get an answer in any of them. Otherwise I'd be more specific. – Andrei Kalantarian Jan 10 '22 at 05:58
  • @kaylum, can you answer the question for C language? for C++ language? – Andrei Kalantarian Jan 10 '22 at 06:01
  • 1
    Jumping to conclusions a bit? I didn't downvote. And I don't need to be able to answer the question to suggest ways for the question to be improved to be in line with Stack Overflow guidelines. – kaylum Jan 10 '22 at 08:53
  • @kaylum, my apologies. The thing is, if the question is simple and completely lookup-able, it is greatly upvoted and readily answered. Messed up terminology or poor English is forgiven. And, opposite, if the question seems out of reach, it is immediately downvoted. For some reason, my "C" tag was removed (probably, folded up with "C++"). I provided the example only in C because it seems simple to show what I was looking for. – Andrei Kalantarian Jan 10 '22 at 13:41
  • Don't worry about it too much. But just be aware that any community has norms and expectations. Which some members may find difficult to understand or even may not agree with at first. One of those is that each question should only ask about one thing, including one language. There are various reasons for that and if you want to discuss or debate it there is [meta] which is a forum for Stack Overflow policies, usage, etc. Anyway, no big deal and I hope Stack Overflow continues to help you in many things. – kaylum Jan 10 '22 at 19:56

1 Answers1

0

In C there is a function for this in strings.h called strcasecmp(). Although you may need an equivalent (Windows in VS) as noted in this answer: error C3861: 'strcasecmp': identifier not found in visual studio 2008?

So you could write something like this.

#include <stdio.h>
#include <strings.h>

#ifdef _MSC_VER
//not #if defined(_WIN32) || defined(_WIN64) because we have strncasecmp in mingw
#define strncasecmp _strnicmp
#define strcasecmp _stricmp
#endif

int main () {
    char *str1 = "this is a string";
    char *str2 = "THIS IS A STRING";

    if (strcasecmp(str1, str2) == 0) printf("Strings match\n");

    return 0;
}

If you really wanted, you could use this same approach with C++ strings by using their c_str() function.

strcasecmp(str1.c_str(), str2.c_str())

But really you'd be better off using boost::iequals(str1, str2)

#include <stdio.h>
#include <boost/algorithm/string.hpp>

int main () {
    std::string str1 = "this is a string";
    std::string str2 = "THIS IS A STRING";

    if (boost::iequals(str1, str2)) printf("Strings match\n");

    return 0;
}
codyne
  • 552
  • 1
  • 9
  • Thanks! It boils down to the same _stricmp, which is locale-aware, and that is somehow different from what C# is using. – Andrei Kalantarian Jan 10 '22 at 06:14
  • I would assume the locale to be used as a "gold standard" at least in Windows environment, but looks like .NET does something not exactly compatible. The difference is in 3 cases, involving special characters. Incidentally, C#'s sorting corresponds to file name sorting by File Explorer (if you ignore consecutive digits, it is different story). I was curious, whether there is a way to legally implement that sorting order by C or C++ without resorting to character table. – Andrei Kalantarian Jan 10 '22 at 06:28
  • C# also has locale aware comparison. It just uses your default culture. What you really need is to get the right locale. You'll need to read the documentation on cultures and locals in C# and also for C or C++ in the Windows API. – siride Jan 10 '22 at 14:52
  • @siride, I tried comparing as lowercase using , and it doesn't work. Maybe the thing is to play with std::locale. Theoretically it should use default locale (en_US). The spec did not tell, that _locale_t is functionally identical to std::locale. Will check it out, thanks! – Andrei Kalantarian Jan 10 '22 at 14:58
  • Nope, same 3 differences like with _locale_t. Was worth a try though, I can replace _wcsnicmp with std::strcoll and make it portable (who knows, maybe I will need it). – Andrei Kalantarian Jan 10 '22 at 15:22
  • 1
    I took a look at the source code for String and ultimately it calls some C++ library function: https://referencesource.microsoft.com/#mscorlib/system/globalization/compareinfo.cs,1356715f78447ed6,references. Here's this: https://stackoverflow.com/a/23118596/394487. So maybe you can use that COM function. I can't seem to find any documentation for it, though. – siride Jan 10 '22 at 16:59
  • @siride, thanks, that means, that compare is handled by the undocumented library function. It uses "locale name" rather than basic _locale_t or std::locale, so it is the end of the road. Thank you - you answered the question, and the answer is "there is no exact analog". – Andrei Kalantarian Jan 10 '22 at 22:39