18

I have a list of strings that can contain a letter or a string representation of an int (max 2 digits). They need to be sorted either alphabetically or (when it is actually an int) on the numerical value it represents.

Example:

IList<string> input = new List<string>()
    {"a", 1.ToString(), 2.ToString(), "b", 10.ToString()};

input.OrderBy(s=>s)
  // 1
  // 10
  // 2
  // a
  // b

What I would want is

  // 1
  // 2
  // 10
  // a
  // b

I have some idea involving formatting it with trying to parse it, then if it is a successfull tryparse to format it with my own custom stringformatter to make it have preceding zeros. I'm hoping for something more simple and performant.

Edit
I ended up making an IComparer I dumped in my Utils library for later use.
While I was at it I threw doubles in the mix too.

public class MixedNumbersAndStringsComparer : IComparer<string> {
    public int Compare(string x, string y) {
        double xVal, yVal;

        if(double.TryParse(x, out xVal) && double.TryParse(y, out yVal))
            return xVal.CompareTo(yVal);
        else 
            return string.Compare(x, y);
    }
}

//Tested on int vs int, double vs double, int vs double, string vs int, string vs doubl, string vs string.
//Not gonna put those here
[TestMethod]
public void RealWorldTest()
{
    List<string> input = new List<string>() { "a", "1", "2,0", "b", "10" };
    List<string> expected = new List<string>() { "1", "2,0", "10", "a", "b" };
    input.Sort(new MixedNumbersAndStringsComparer());
    CollectionAssert.AreEquivalent(expected, input);
}
Boris Callens
  • 90,659
  • 85
  • 207
  • 305

10 Answers10

23

Two ways come to mind, not sure which is more performant. Implement a custom IComparer:

class MyComparer : IComparer<string>
{
    public int Compare(string x, string y)
    {
        int xVal, yVal;
        var xIsVal = int.TryParse( x, out xVal );
        var yIsVal = int.TryParse( y, out yVal );

        if (xIsVal && yIsVal)   // both are numbers...
            return xVal.CompareTo(yVal);
        if (!xIsVal && !yIsVal) // both are strings...
            return x.CompareTo(y);
        if (xIsVal)             // x is a number, sort first
            return -1;
        return 1;               // x is a string, sort last
    }
}

var input = new[] {"a", "1", "10", "b", "2", "c"};
var e = input.OrderBy( s => s, new MyComparer() );

Or, split the sequence into numbers and non-numbers, then sort each subgroup, finally join the sorted results; something like:

var input = new[] {"a", "1", "10", "b", "2", "c"};

var result = input.Where( s => s.All( x => char.IsDigit( x ) ) )
                  .OrderBy( r => { int z; int.TryParse( r, out z ); return z; } )
                  .Union( input.Where( m => m.Any( x => !char.IsDigit( x ) ) )
                               .OrderBy( q => q ) );
LBushkin
  • 129,300
  • 32
  • 216
  • 265
  • Your IComparer doesn't return non-numeric strings in the correct (alphabetical) order. Your LINQ query does. – LukeH Jun 23 '09 at 14:50
  • I added my ending code in the OP. Also noticed the string thing. Furthermore I tried shortcirquiting before every parse. Don't know if it makes much performance sence, but it took me exactly as much effort to reorder them as it would have taken me to test it ;) – Boris Callens Jun 23 '09 at 15:16
  • Made code a whole lot shorter. By applying the system of short cirquiting (literally translated from Dutch "Kortsluitingsprincipe") I only do as much tryparses as needed. – Boris Callens Jun 23 '09 at 15:22
13

Perhaps you could go with a more generic approach and use a natural sorting algorithm such as the C# implementation here.

Nathan Baulch
  • 20,233
  • 5
  • 52
  • 56
  • 1
    Very cool indeed, I just found a Delphi wrapper for this too http://irsoft.de/web/strnatcmp-and-natsort-for-delphi – Peter Turner Dec 04 '09 at 16:43
  • This will not work in all cases. Suppose ypu have the following list of items: "0 / 30" "0 / 248" "0 / 496" "0 / 357.6". This order will be keept after sorting, which is not what you may expect. – AH. Oct 03 '14 at 07:30
  • 3
    Instead of pasting some urls you should add the code here to avoid dead links. Now this answer isn't more than a comment – Toshi Nov 07 '17 at 07:56
3

Use the other overload of OrderBy that takes an IComparer parameter.

You can then implement your own IComparer that uses int.TryParse to tell if it's a number or not.

Christian Hayter
  • 30,581
  • 6
  • 72
  • 99
3

I had a similar problem and landed here: sorting strings that have a numeric suffix as in the following example.

Original:

"Test2", "Test1", "Test10", "Test3", "Test20"

Default sort result:

"Test1", "Test10", "Test2", "Test20", "Test3"

Desired sort result:

"Test1", "Test2", "Test3, "Test10", "Test20"

I ended up using a custom Comparer:

public class NaturalComparer : IComparer
{

    public NaturalComparer()
    {
        _regex = new Regex("\\d+$", RegexOptions.IgnoreCase);
    }

    private Regex _regex;

    private string matchEvaluator(System.Text.RegularExpressions.Match m)
    {
        return Convert.ToInt32(m.Value).ToString("D10");
    }

    public int Compare(object x, object y)
    {
        x = _regex.Replace(x.ToString(), matchEvaluator);
        y = _regex.Replace(y.ToString(), matchEvaluator);

        return x.CompareTo(y);
    }
}   

Usage:

var input = new List<MyObject>(){...};
var sorted = input.OrderBy(o=>o.SomeStringMember, new NaturalComparer());

HTH ;o)

mike
  • 1,627
  • 1
  • 14
  • 37
  • this is exactly what needed to me but I have the list of string in inside the list type of object, can you show on how to use this method? – Rasik Sep 09 '20 at 14:52
  • @AakashBashyal I added an example to my answer above. – mike Sep 11 '20 at 08:33
  • just creating a list type object will sort the item inside the list? – Rasik Sep 11 '20 at 08:35
  • but there is error on `x = _regex.Replace(x.ToString, matchEvaluator);` which says `Argument 1: cannot convert from 'method group' to 'string'` – Rasik Sep 11 '20 at 08:38
  • @AakashBashyal sorry, my bad. i fixed the code. thanks! – mike Sep 14 '20 at 21:29
2

I'd say you could split up the values using a RegularExpression (assuming everything is an int) and then rejoin them together.

//create two lists to start
string[] data = //whatever...
List<int> numbers = new List<int>();
List<string> words = new List<string>();

//check each value
foreach (string item in data) {
    if (Regex.IsMatch("^\d+$", item)) {
        numbers.Add(int.Parse(item));
    }
    else {
        words.Add(item);
    }
}

Then with your two lists you can sort each of them and then merge them back together in whatever format you want.

hugoware
  • 35,731
  • 24
  • 60
  • 70
2

You could just use function provided by the Win32 API:

[DllImport ("shlwapi.dll", CharSet=CharSet.Unicode, ExactSpelling=true)]
static extern int StrCmpLogicalW (String x, String y);

and call it from an IComparer as others have shown.

Skizz
  • 69,698
  • 10
  • 71
  • 108
1
public static int? TryParse(string s)
{
    int i;
    return int.TryParse(s, out i) ? (int?)i : null;
}

// in your method
IEnumerable<string> input = new string[] {"a", "1","2", "b", "10"};
var list = input.Select(s => new { IntVal = TryParse(s), String =s}).ToList();
list.Sort((s1, s2) => {
    if(s1.IntVal == null && s2.IntVal == null)
    {
        return s1.String.CompareTo(s2.String);
    }
    if(s1.IntVal == null)
    {
        return 1;
    }
    if(s2.IntVal == null)
    {
        return -1;
    }
    return s1.IntVal.Value.CompareTo(s2.IntVal.Value);
});
input = list.Select(s => s.String);

foreach(var x in input)
{
    Console.WriteLine(x);
}

It still does the conversion, but only once/item.

Jonathan Rupp
  • 15,522
  • 5
  • 45
  • 61
1

You could use a custom comparer - the ordering statement would then be:

var result = input.OrderBy(s => s, new MyComparer());

where MyComparer is defined like this:

public class MyComparer : Comparer<string>
{
    public override int Compare(string x, string y)
    {

        int xNumber;
        int yNumber;
        var xIsNumber = int.TryParse(x, out xNumber);
        var yIsNumber = int.TryParse(y, out yNumber);

        if (xIsNumber && yIsNumber)
        {
            return xNumber.CompareTo(yNumber);
        }
        if (xIsNumber)
        {
            return -1;
        }
        if (yIsNumber)
        {
            return 1;
        }
        return x.CompareTo(y);
    }
}

Although this may seem a bit verbose, it encapsulates the sorting logic into a proper type. You can then, if you wish, easily subject the Comparer to automated testing (unit testing). It is also reusable.

(It may be possible to make the algorithm a bit clearer, but this was the best I could quickly throw together.)

Mark Seemann
  • 225,310
  • 48
  • 427
  • 736
1

You could also "cheat" in some sense. Based on your description of the problem, You know any String of length 2 will be a number. So just sort all the Strings of length 1. And then sort all the Strings of length 2. And then do a bunch of swapping to re-order your Strings in the correct order. Essentially the process will work as follows: (assuming your data is in an array.)

Step 1: Push all Strings of length 2 to the end of the array. Keeping track of how many you have.

Step 2: In place sort the Strings of length 1 and Strings of length 2.

Step 3: Binary search for 'a' which would be on the boundary of your two halves.

Step 4: Swap your two digit Strings with the letters as necessary.

That said, while this approach will work, does not involve regular expressions, and does not attempt to parse non-int values as an int -- I don't recommend it. You'll be writing significantly more code than other approaches already suggested. It obfuscates the point of what you are trying to do. It doesn't work if you suddenly get two letter Strings or three digit Strings. Etc. I'm just including it to show how you can look at problems differently, and come up with alternative solutions.

Rob Rolnick
  • 8,519
  • 2
  • 28
  • 17
1

Use a Schwartzian Transform to perform O(n) conversions!

private class Normalized : IComparable<Normalized> {
  private readonly string str;
  private readonly int val;

  public Normalized(string s) {
    str = s;

    val = 0;
    foreach (char c in s) {
      val *= 10;

      if (c >= '0' && c <= '9')
        val += c - '0';
      else
        val += 100 + c;
    }
  }

  public String Value { get { return str; } }

  public int CompareTo(Normalized n) { return val.CompareTo(n.val); }
};

private static Normalized In(string s) { return new Normalized(s); }
private static String Out(Normalized n) { return n.Value; }

public static IList<String> MixedSort(List<String> l) {
  var tmp = l.ConvertAll(new Converter<String,Normalized>(In));
  tmp.Sort();
  return tmp.ConvertAll(new Converter<Normalized,String>(Out));
}
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
  • Not really simpler then what I posted for all I know. Could be more performant, but it's not critical enough to put perf over simplicity – Boris Callens Jun 23 '09 at 15:26