0

I have a list of folder names that represent chapters, subchapters, sections, paragraphs and lines in a specification. A small sample of these folders looks like the following.

  • 1_1_1
  • 1_1_12
  • 1_1_2
  • 1_2_1
  • 1_2_1_3_1
  • 1_2_2

I need to write a function that sorts these numerically and taking account for hierarchical nesting. For instance the correct output of sorting the above would be.

  1. 1_1_1
  2. 1_1_2
  3. 1_1_12
  4. 1_2_1
  5. 1_2_1_3_1
  6. 1_2_2

Since this is very much the same way version numbers are sorted I have tried the following code which worked until it attempts to process an input with more than 4 sections (i.e. 1_2_1_3_1)

private List<Resource> OrderResources(List<Resource> list)
    {
        return list.OrderBy(v => System.Version.Parse(v.Id.Replace('_', '.'))).ToList();
    }

The error I get is

System.ArgumentException : Version string portion was too short or too long. (Parameter 'input')
Rob S.
  • 1,044
  • 7
  • 25

3 Answers3

3

Sorting is possible if you add n characters 0 between the digits.

Also You can use long and move the numbers n digits to the left, but then there will be a limit on the length of the number.

static void Main(string[] args)
{
    var chapter = new List<string>();
    chapter.Add("1_1_1");
    chapter.Add("1_1_12");
    chapter.Add("1_1_2");
    chapter.Add("1_2_1");
    chapter.Add("1_2_1_3_1");
    chapter.Add("1_2_2");
    var result = chapter.OrderBy(x=>SortCalc(x)).ToArray();
    foreach (var s in result)
    {
        Console.WriteLine($"{s}->{SortCalc(s)}");
    }
}

private static string SortCalc(string x, int count = 3)
{
    var array = x.Split('_').ToList();
    
    for (var index = 0; index < array.Count; index++)
    {
        var length = count - array[index].Length;
        if (length <=0)
            continue;
        array[index] = new string('0', length)+ array[index];
    }

    var num = string.Join("", array);
    return num;
}

Output will be

1_1_1->001001001
1_1_2->001001002
1_1_12->001001012
1_2_1->001002001
1_2_1_3_1->001002001003001
1_2_2->001002002
Stanislav
  • 459
  • 3
  • 6
1

I am posting this as a way to help the OP that has a problem with the solution posted at Natural Sort Order in C#

This answer is just an example on how to use that class described in that answer. So I am pretty close to a verbatim copy of the mentioned answer. But, in any case, please look at the comment in the question.

Here we go.

First the class NaturalStringComparer is taken in full from the answer linked above.

Next I add a fac-simile of the class Resource used by the OP and some code to initialize it

public class Resource
{ 
    public string ID { get; set; }
    .... other properties???....   
}

void Main()
{
    List<Resource> unordered = new List<Resource>
    {
        new Resource{ ID = "1_1_12"},
        new Resource{ ID = "1_2_1_3_1"},
        new Resource{ ID = "1_1_2"},
        new Resource{ ID = "1_2_2"},
        new Resource{ ID = "1_1_1"},
        new Resource{ ID = "1_2_1"},
    };

Now the call to the NaturalStringOrder is the following

var c = new NaturalStringComparer();
var r = unordered.OrderBy(u => u.ID, c);

And the results are

foreach(var x in r)
    Console.WriteLine(x.ID);

---------
1_1_1
1_1_2
1_1_12
1_2_1
1_2_1_3_1
1_2_2

I provide also a benchmark to check the performances.

Stopwatch sw = new Stopwatch();
sw.Start();
for (int x = 0; x < 1000000; x++)
{
    var c = new NaturalStringComparer();
    var r = unordered.OrderBy(u => u.ID, c);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

I get 17ms, that looks pretty good.

Steve
  • 213,761
  • 22
  • 232
  • 286
1

I decided to implement it using Linq as well:

using System.Collections.Generic;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        var chapter = new List<string>();
        chapter.Add("1_5");
        chapter.Add("1_4_1");
        chapter.Add("1_1_1");
        chapter.Add("1_1_12");
        chapter.Add("1_1_2");
        chapter.Add("1_2_1");
        chapter.Add("1_2_1_3_1");
        chapter.Add("1_2_2");
        chapter.Add("1_2_22");
        chapter.Add("1_2_21");
        
        var result = chapter
            .OrderBy(aX=> aX.Replace("_", "0"))
            .ToArray();
        
        foreach (var s in result)
        {
            Console.WriteLine($"{s} -> {s.Replace("_", "0")}");
        }
    }
}

Also maybe for some extra revs, the original implementation could benefit from Span<char> chars = stackalloc char[x.Length];. You can then skip the part while doing Split and make it:

using System.Collections.Generic;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        var chapter = new List<string>();
        chapter.Add("1_5");
        chapter.Add("1_4_1");
        chapter.Add("1_1_1");
        chapter.Add("1_1_12");
        chapter.Add("1_1_2");
        chapter.Add("1_2_1");
        chapter.Add("1_2_1_3_1");
        chapter.Add("1_2_2");
        chapter.Add("1_2_22");
        chapter.Add("1_2_21");
        
        var spanResult = chapter
            .OrderBy(Sort)
            .ToArray();
        
        foreach (var s in spanResult)
        {
            Console.WriteLine($"{s} -> {Sort(s)}");
        }
    }

    static string Sort(string aValue)
    {
        const char zero = '0';
        const char underscore = '_';
        
        Span<char> chars = stackalloc char[aValue.Length];
        for (int i = 0; i < aValue.Length; i++)
        {
            if (aValue[i] == underscore)
            {
                if (i != aValue.Length - 1)
                {
                    chars[i] = zero;
                }
                
                continue;
            }
            chars[i] = aValue[i];
        }

        return chars.ToString();
    }
}

The output:

Linq
1 -> 1
1_ -> 10
1_1_1 -> 10101
1_1_12 -> 101012
1_1_2 -> 10102
1_2_1 -> 10201
1_2_1_3_1 -> 102010301
1_2_2 -> 10202
1_2_21 -> 102021
1_2_22 -> 102022
1_4_1 -> 10401
1_5 -> 105

Span
1 -> 1
1_ -> 1
1_1_1 -> 10101
1_1_12 -> 101012
1_1_2 -> 10102
1_2_1 -> 10201
1_2_1_3_1 -> 102010301
1_2_2 -> 10202
1_2_21 -> 102021
1_2_22 -> 102022
1_4_1 -> 10401
1_5 -> 105

The assumption in my implementation is that _ will only be applied in case there is "hierarchy", otherwise the implementation would need to check for that, of course.

I have added the updated implementation in case the "assumption" is not correct.

Dharman
  • 30,962
  • 25
  • 85
  • 135
D34NM
  • 101
  • 7
  • Well, my major concern with your solution (and the other one) is the calls to string methods (like string.Replace) that create new strings at each call. This, of course, is of negligible impact on few strings, but on a larger set could be something to keep under control. However good work. – Steve Nov 28 '20 at 11:24
  • @Steve, that is why I provided 2 solutions: one is using `Span` and the other `Linq`. Mostly as the "premature optimization is the root of all evil". The "optimized" one was there for the same reasons you said as well: to eliminate the "heavy" operations from the algorithm. I get what you mean ;) – D34NM Nov 28 '20 at 13:51