11

I am facing a strange problem while sorting a list of strings with integer values. However some values could be prefixed with some characters.

e.g.

// B1, 5, 50, A10, 7, 72, B3, A1, A2

There are basically page numbers and should be sorted like:

// A1, A2, A10, B1, B3, 5, 7, 50, 72

But if I use default string sorting then these will be sorted like

// A1, A10, A2, B1, B3, 5, 50, 7, 72

Any solution for this in C#?

Michael Myers
  • 188,989
  • 46
  • 291
  • 292
Ramesh Soni
  • 15,867
  • 28
  • 93
  • 113
  • You can use this `NaturalStringComparer` that I put together and cleaned up a bit (Don't remember where I got the basis for it). It uses the Win32 function StrCmpLogicalW that Skizz mentions. http://my.opera.com/Svishy/blog/2009/03/02/natural-sorting – Svish Apr 24 '09 at 10:58

6 Answers6

17

You're looking for the Alphanum algorithm. Fortunately for you, a number of implementations exist already. See here.

John Feminella
  • 303,634
  • 46
  • 339
  • 357
5

This is how I solved it for our application, order will be like in a windows directory:

public class NaturalSortComparer : IComparer<string>
{
    public int Compare(string x, string y)
    {
        return StrCmpLogicalW(x, y);
    }

    [DllImport("shlwapi.dll", CharSet = CharSet.Unicode, ExactSpelling = true)]
    public static extern int StrCmpLogicalW(string x, string y);
}

Usage:

  NaturalSortComparer comparer = new NaturalSortComparer();
  return comparer.Compare(string1, string2);

But it's probably not exactly what you want:

// A1, A2, A10, B1, B3, 5, 7, 50, 72

This will give

// 5, 7, 50, 72, A1, A2, A10, B1, B3

Carra
  • 17,808
  • 7
  • 62
  • 75
3

What you are looking for is a Natural Sort.

Jeff Atwood made a great post on his blog once, explaining the concept and linking to various other sources with algorithms you could take as an example.

Sorting for Humans : Natural Sort Order

Cœur
  • 37,241
  • 25
  • 195
  • 267
lowglider
  • 1,087
  • 2
  • 11
  • 21
1

Here's a custom comparer that will sort into your required order. Note that there are no error/sanity checks in this code: it assumes that all the strings will be in the correct format.

public class MyComparer : IComparer<string>
{
    public int Compare(string x, string y)
    {
        Match xMatch = Regex.Match(x, @"^(\D*)(\d+)$");
        Match yMatch = Regex.Match(y, @"^(\D*)(\d+)$");

        string xChars = xMatch.Groups[1].Value;
        string yChars = yMatch.Groups[1].Value;

        if ((xChars.Length == 0) && (yChars.Length > 0))
        {
            return 1;
        }
        else if ((xChars.Length > 0) && (yChars.Length == 0))
        {
            return -1;
        }
        else
        {
            int charsResult = xChars.CompareTo(yChars);

            return (charsResult != 0)
                ? charsResult
                : int.Parse(xMatch.Groups[2].Value)
                    .CompareTo(int.Parse(yMatch.Groups[2].Value));
        }
    }
}

You can use it like so:

List<string> testList =
    new List<string>() { "B1","5","50","A10","7","72","B3","A1","A2" };

testList.Sort(new MyComparer());    // A1, A2, A10, B1, B3, 5, 7, 50, 72
LukeH
  • 263,068
  • 57
  • 365
  • 409
0

Well, you could always pInvoke the Win32 API function StrCmpLogicalW which does exactly what you want (it's what Explorer uses to sort filenames). The only possible downside is that the sort is case insensitive.

Skizz
  • 69,698
  • 10
  • 71
  • 108
0

Not sure about performance, and sure this can be optimized, but it do the work:

string[] sort(string[] data)
{
    return data
        .OrderBy(s => Regex.Match(s, @"^\D").Length == 0)
        .ThenBy(s => Regex.Match(s, @"\D*").Value)
       .ThenBy(s => Int32.Parse(Regex.Match(s, @"\d+").Value)).ToArray();
}

var result = sort(new string[] { "B1", "5", "50", "A10", "7", "72", "B3", "A1", "A2" });
Kamarey
  • 10,832
  • 7
  • 57
  • 70