-1

I am trying to order mixed strings in a unique way and am wondering if anyone else has done this. I have found several articles on using IOrderConparer but can not find a solution for my particular sorting problem.

I have the following:

1017, 650, 650C, 650B, W323, 10, 20, 1000, W1000

I need to order them as follows:

10, 20, 650, 650B, 650C, 1000, 1017, W323, W1000

Any help would be appreciated. Thanks.

Cyberdrew
  • 1,832
  • 1
  • 19
  • 39
  • What is the logic behind your ordering? – Chris Aug 11 '15 at 19:36
  • 1
    Yeah, except you didn't bother reading the question. He was asking about sort order. He wants logical, not string sorting. – SledgeHammer Aug 11 '15 at 19:38
  • @TommyGrovnes: So is Stack Overflow. – Cyberdrew Aug 11 '15 at 19:48
  • 1
    A custom IComparer would be a better choice in the .Net paradigm, not PInvoke using unmanaged code – Mrinal Kamboj Aug 11 '15 at 21:04
  • Agree with Mrinal Kamboj, the provided search contains multiple opportunities to learn c#, see example of how the first result in the search can work @SledgeHammer Cyberdrew et al. – Tommy Grovnes Aug 11 '15 at 21:09
  • Actually, the FASTER method would be preferable. Your managed comparer is super slow compared to the pinvoke. Over 10x slower. – SledgeHammer Aug 11 '15 at 21:30
  • 1
    Also lets understand PInvoke is someone else's code that you are projecting, first create your own version before you jump on the performance bandwagon – Mrinal Kamboj Aug 11 '15 at 21:42
  • Create my own? Why? I've got better things to do then reinvent the wheel of existing APIs for no good reason. The original C# regex / linq solution posted will ALWAYS be oodles slower then the pinvoke solution whether its 1 string or 10M strings. If I was your boss and caught you reinventing wheels for no reason just because you don't like using pinvoke, I would absolutely fire you and find an engineer who was more interested in solving the actual tasks rather then self imposed rules like that. Do you reinvent everything? .NET does not cover the entire Win32 API. Not by a long shot. – SledgeHammer Aug 12 '15 at 14:06
  • 1
    @Cyberdrew Check this out as the solution http://www.dotnetperls.com/alphanumeric-sorting Purely managed code, would be quick too – Mrinal Kamboj Aug 12 '15 at 22:08

2 Answers2

1

Implement the compare using StrCmpLogicalW:

[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
public static extern int StrCmpLogicalW(string x, string y);

This will do a logical (taking numbers into account) string compare rather then a standard string based one which will give you the result you are after.

SledgeHammer
  • 7,338
  • 6
  • 41
  • 86
  • You rock! Works like a charm! – Cyberdrew Aug 11 '15 at 19:45
  • 1
    Why PInvoke for logical string comparison, there should be a managed way to achieve it – Mrinal Kamboj Aug 11 '15 at 21:02
  • LOL... well, the pinvoke version is over 10x faster then your managed version, so I think I'll use the pinvoke version since I care about performance :). – SledgeHammer Aug 11 '15 at 21:29
  • In your dreams PInvoke will beat managed code, you may want to retrieve your statement, else its pretty clear you have no idea regarding .Net framework and its working. Don't go about claiming it, since what you are looking at is not most efficient IComparer code. Also when you make such statements, then care to post some data to support your version – Mrinal Kamboj Aug 11 '15 at 21:36
  • In my dreams? Did you actually read the code he wrote? RegEx, Linq, etc. Those are very slow technologies (and you didn't cache / precompile anything). The pinvoke is calling the C++ version in shlwapi.dll that is finely tuned for performance and doesn't use any of that crap. I actually DID test it out because I suspected the code would be very slow. I simply wrote a loop around the Array.Sort line and timed the diff with replacing the guts. Pinvoke=36ms, your version 471ms :). Not saying your code could not be improved, but what he posted is slow as **** :). – SledgeHammer Aug 11 '15 at 21:56
  • The point wasn't speed, the point was a single google search provides a bunch of starting points for someone who obviously needs help with some homework. Nice copied answer btw ;) – Tommy Grovnes Aug 11 '15 at 22:10
1

A bit longer but all managed code

public class LogicalSorter : IComparer
{
    public int Compare(object a, object b)
    {
        var first = Regex.Split((string)a,"([0-9]+)").Where(s => s != "").ToArray();
        var second = Regex.Split((string)b,"([0-9]+)").Where(s => s != "").ToArray();

        var endIdx = Math.Min(first.Count(), second.Count());

        for (var i = 0; i < endIdx; i++)
        {
            var part1 = first.ElementAt(i);
            var part2 = second.ElementAt(i);

            if (part1.All(char.IsDigit) && part2.All(char.IsDigit) && part1 != part2)
            {
                return int.Parse(part1).CompareTo(int.Parse(part2));
            }

            if (part1 != part2) return part1.CompareTo(part2);
        }

        return first.Count().CompareTo(second.Count());
    }
}

Use it like this

string[] values = { "1017", "650", "650C", "650B", "W323", "10", "20", "1000", "W1000" };

Array.Sort(values, new LogicalSorter());

foreach (var value in values)
   Console.WriteLine(value);

Or using generics as suggested by Mrinal (preferred)

public class LogicalSorter : IComparer<String>
{
    public int Compare(String a, String b)
    {
        var first = Regex.Split(a, "([0-9]+)").Where(s => s != "").ToArray();
        var second = Regex.Split(b, "([0-9]+)").Where(s => s != "").ToArray();

        var endIdx = Math.Min(first.Count(), second.Count());

        for (var i = 0; i < endIdx; i++)
        {
            var part1 = first.ElementAt(i);
            var part2 = second.ElementAt(i);

            if (part1.All(char.IsDigit) && part2.All(char.IsDigit) && part1 != part2)
            {
                return int.Parse(part1).CompareTo(int.Parse(part2));
            }

            if (part1 != part2) return part1.CompareTo(part2);
        }

        return first.Count().CompareTo(second.Count());
    }
}

Example of optimized managed code (for speed not for looks), performs at 47x the regex version

public class LogicalSorter : IComparer<String>
{
    public int Compare(String a, String b)
    {
        var aLength = a.Length;
        var bLength = b.Length;
        var aIdx = 0;
        var bIdx = 0;
        int aPartLen;
        int bPartLen;
        int aPartEndIndex;
        int bPartEndIndex;
        bool aIsString;
        bool bIsString;

        // Examine both strings on character level, keep track of where
        // we are in each string since lengths might differ
        while (aIdx < aLength && bIdx < bLength)
        {
            // If both strings contain digit at current index
            // compare numbers
            if (char.IsDigit(a[aIdx]) && char.IsDigit(b[bIdx]))
            {
                // Get longest consecutive list of digits from each string
                aPartEndIndex = aIdx;
                while (aPartEndIndex < aLength && char.IsDigit(a[aPartEndIndex])) { aPartEndIndex++; }

                bPartEndIndex = bIdx;
                while (bPartEndIndex < bLength && char.IsDigit(b[bPartEndIndex])) { bPartEndIndex++; }

                aPartLen = aPartEndIndex - aIdx;
                bPartLen = bPartEndIndex - bIdx;

                // Compare lengths (longest number is greater)
                if (aPartLen != bPartLen) return aPartLen < bPartLen ? -1 : 1;

                // Same length numbers, compare chars until not same or end
                while (aIdx < aPartEndIndex && a[aIdx] == b[bIdx])
                {
                    aIdx++;
                    bIdx++;
                }

                // If not at end compare last characters that were not same
                if(aIdx != aPartEndIndex)
                    return a[aIdx] < b[bIdx] ? -1 : 1;
            }
            else
            {
                // Comparing string vs number or string vs string
                aIsString = char.IsLetter(a[aIdx]);
                bIsString = char.IsLetter(b[bIdx]);

                // if not 2 strings, number is always first
                if (aIsString != bIsString) return aIsString ? 1 : -1;

                // Get longest consecutive list of letters from each string
                aPartEndIndex = aIdx;
                while (aPartEndIndex < aLength && (char.IsLetter(a[aPartEndIndex]) == aIsString))
                {
                    aPartEndIndex++;
                }

                bPartEndIndex = bIdx;
                while (bPartEndIndex < bLength && (char.IsLetter(b[bPartEndIndex]) == bIsString))
                {
                    bPartEndIndex++;
                }

                // Compare chars until not same or end
                while (aIdx < aPartEndIndex && bIdx < bPartEndIndex && a[aIdx] == b[bIdx])
                {
                    aIdx++;
                    bIdx++;
                }

                // if not at end compare last letters found
                if ((aIdx != aPartEndIndex) || (bIdx != bPartEndIndex))
                    return a[aIdx] < b[bIdx] ? -1 : 1;
            }
        }

        // Use length as tie breaker
        return aLength < bLength ? -1 : 1;
    }
}
Tommy Grovnes
  • 4,126
  • 2
  • 25
  • 40
  • Make it IComparer that way you do not need object comparison and typecasting, it would string comparison and remaining would remain same – Mrinal Kamboj Aug 11 '15 at 21:18
  • Of course I was trying to solve it using the first link of the google search, but yeah you are absolutely right, edited answer – Tommy Grovnes Aug 11 '15 at 21:22
  • Still sloooooowww... for a loop of 10,000, your version = ~500ms... StrCmpLogicalW = 40ms. C++ code in general is always going to be faster then C# code, especially when the C# version is poorly written. You build the same exact RegEx parser 20,000 times for starters. LOL. – SledgeHammer Aug 11 '15 at 22:09
  • The point wasn't speed, the point was a single google search provides a bunch of starting points for someone who obviously needs help with some homework. Nice copied answer btw ;) – Tommy Grovnes Aug 11 '15 at 22:11
  • For the sake of argument, I fixed your poor use of RegEx and now it only builds the expression once, and that knocked it down to ~300ms... so just that one tweak is a huge improvement. – SledgeHammer Aug 11 '15 at 22:13
  • Tweaked some more for the sake of argument – Tommy Grovnes Aug 12 '15 at 01:13
  • 1
    Hey, nice job on this. Hard to follow, but we were talking performance, so this version beats Pinvoke / StrCmpLogicalW. ~250ms vs. ~100ms on 100k iterations (tested on a different faster PC then earlier). I even up voted your answer to show you I'm a good sport :). With that being said, personally in a work project, unless I was specifically dialing it in for every single tick of performance (which you rarely do), I'd probably just use the built in stuff. My boss probably wouldn't like spending a day tweaking the hell out of a built in function unless it was horrible :). But again, nice job!:) – SledgeHammer Aug 12 '15 at 06:02
  • 1
    @ SledgeHammer please understand that idea is not to discredit solution what you have pasted, as it seems that is a standard and optimized way in the win32 apis, but we need to be aware of the cost involved in doing so when we cross the managed boundary, mostly this is done for critical operations, which are unavailable in the managed code. For current scenario, there will be multiple solutions available using managed code. Also if we have big collection say 1000 strings or more, you would find PInvoke will incrementally make things slower vis a vis managed code – Mrinal Kamboj Aug 12 '15 at 08:17
  • @MrinalKamboj, Jeez, lets get real. In the response right above yours, I posted the times for *100K* calls and the diff with Pinvoke vs his super optimized version was roughly 150ms. That isn't perceivable by a human. Starts to be perceivable at 250ms to 500ms+. Also, logical sorting is usually a UI op. You design UIs that display 100K strings to the user? Sounds very user friendly :). Also, keep in mind, a huge portion of .NET calls pinvoke under the covers anyways :). – SledgeHammer Aug 12 '15 at 15:00
  • Alright @ SledgeHammer continue calling PInvoke as much as possible, its your choice, but who told you sorting is just a UI operation, so many application which paginate using server memory extensively sort data there, Datatable can contain many thousand to millions of record in the memory, just that UI renders a small subset. Do not compare .Net calling Win32 dlls to PInvoke, MS doesn't expose their private calls and mechanism for public consumption, there internal mechanism would be far more optimized. When I get time, let me create a good ICompare solution for you to test – Mrinal Kamboj Aug 12 '15 at 17:00
  • @MrinalKamboj, I don't call it as much as possible LOL. In this case I would because the overhead is negligible as demonstrated with 100K calls. How do you write real applications without at least SOME pinvoke? Like how would you get the current Windows theme with pure C#? How do you remove the Window icon from a WPF window without pinvoke? How do you do low level window hooks without pinvoke? I just gave you 3 real examples that you could not possibly do in 100% C#. To flat out refuse to use pinvoke because you think its a huge perf hit is just a stupid mentality to have. – SledgeHammer Aug 12 '15 at 20:29
  • @ SledgeHammer you are happy with 100K calls with data size (9 elements, this is genuine LOL), good for nothing assuming this to be UI only operation. You can try 1 bn calls, result would remain same, along with it certainly needs a better IComparer. In your own words, "SOME PInvoke?", did I ever said should never use, there are genuine use case as suggested, we had a discussion only in current question context and you have taken it around the world with good for nothing examples, which do not fit in current context. I am Ok with you using PInvoke for every other operation, carry on – Mrinal Kamboj Aug 12 '15 at 20:43
  • @MrinalKamboj, you keep saying I use pinvoke for everything. I only use it when needed. You seem to avoid it like the plague and reinvent everything. Yes, the pinvoke will get progressively worse with huge data sets. At 5M compares, the optimized C# solution beats it by 4s on my slower work PC. Would I spend a few hours developing a pure C# solution to save 4s? That will probably run in 1s on a faster PC? Probably not. Would love to see your falsified time sheets explaining why you did that to your boss :). – SledgeHammer Aug 12 '15 at 21:34
  • @MrinalKamboj... so you keep saying you are an awesome C# developer and can beat everybody? Where is your IComparer implementation that can beat this one? – SledgeHammer Aug 12 '15 at 21:35
  • @SledgeHammer dude you come up with points I never said, they are at tangent to the discussion. My points related to PInvoke were simply following what MS documentation and what many others authors suggest regarding its usage. I should be able to come up with some implementation, as I get time out of work, not to prove / disprove anything but to work and learn a valid solution of my own, anyway, I should give up arguing, cannot surely compete with you, you are anyway getting too personal in your comments – Mrinal Kamboj Aug 12 '15 at 21:42
  • Try the following - http://www.dotnetperls.com/alphanumeric-sorting It must provide a good performance – Mrinal Kamboj Aug 12 '15 at 22:01
  • @MrinalKamboj, LOL... you are the one who started bashing. I posted numbers saying the RegEx / Linq solution was 10x slower then pinvoke and you called me a liar. I then posted the exact numbers and you tried to put your spin on it to keep face. You keep saying you can do better, but all you have done is run and provide links to somebody elses work. Where is yours? I await to see it! – SledgeHammer Aug 12 '15 at 22:22
  • @MrinalKamboj -- btw, your link sucks. Horrible performance. Worse then both pinvoke and the optimized C# version. Since you will call me a liar, here are the real numbers for 5M -- pinvoke = 13s, this C# = 9s, your link = 26s. LMAO!! Obviously that implemention is slow, it does Int.Parse and boxing / unboxing and lots of allocs -- but you only avoid pinvokes :). Surely you can beat 9s? :) – SledgeHammer Aug 12 '15 at 22:44
  • When I provide a code with a link of a well know site, I have copied someone's work and you have designed StrCmpLogicalW. I would certainly provide my IComparer version, though thanks for pointing the issue with the published code, you have agreed multiple times that managed code has ability to do better, but I think its important that you stick to what you think is best. Not even once I have used the word Liar, its your invention, assumption. You have used the colloquial and uncultured language with impunity. I hope you agree that even 9 sec C# code is not yours – Mrinal Kamboj Aug 13 '15 at 06:55
  • Your words are, "At 5M compares, the optimized C# solution beats it by 4s on my slower work PC. Would I spend a few hours developing a pure C# solution to save 4s?", its anyone's guess who is trying to save the face. Few hours is just once, code is reused forever, even StrCmpLogicalW would have taken few hours – Mrinal Kamboj Aug 13 '15 at 07:01
  • @MrinalKamboj-I never claimed I wrote StrCmpLogicalW. I said it was the correct and "best" solution. You bashed it, but failed to provide anything better. Somebody else posted a C# solution. I said it sucked and was 10x slower. You responded with "In your dreams" and again failed to provide anything meaningful. He re-wrote his code and fixed it. And yes, I will say it again -- at 5M compares, it is 4s faster. No, I would not spend an hour of dev time to save 4s of runtime. Thats stupid. Your link was the worst solution of all. Even worse then the original C#. – SledgeHammer Aug 13 '15 at 14:04
  • @MrinalKamboj -- you keep talking about providing a solution? Been waiting for it for 3 days now. LOL. Do you even know C# or just making things up? You claim you keep 1B strings in memory? LMAO!!! When was the last time you wrote an app that even kept 1M or even 10K strings in memory? Apparently you have machines with like 30 to 40GB+ of RAM hahah... yeah, nice try... and yeah, I'll stick to my thing that there is no reason to sort strings except for a human (unless its required by an algorithm). – SledgeHammer Aug 13 '15 at 14:05