4

I need to sort a list with multilevel indexes

(@('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10") | Sort-Object) -join ", "

1, 1.1.1, 1.99, 10, 2, 2.5, 3, 5.5

I came up with such a solution, but it does not work with indexes above ten

(@('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10") | Sort-Object {
    $chanks = $_.Split('.')
    $sum = 0
    for ($i = 0; $i -lt $chanks.Count; $i++) {
        $sum += [int]$chanks[$i] / [math]::Pow(10, $i)
    }
    $sum
}) -join ", "

1, 1.1.1, 2, 2.5, 3, 5.5, 10, 1.99

Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68

4 Answers4

4

Based on Lance U. Matthews comment to your question ... | Sort-Object { [int] $_.Split('.')[0] }, { [int] $_.Split('.')[1] }, { [int] $_.Split('.')[2] },

here is a way to do the same thing while not knowing the number of nested levels.

It is done by checking the max nesting level and storing it into a variable, then building dynamically an array of scriptblock to sort against.

$Arr = @('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10", '12.9.3.1.5', '176', '12.9.9', '2.1') 
$MaxDots = ($Arr | % { $_.Split('.').count } | Measure-Object  -Maximum).Maximum

$sbl = for ($i = 0; $i -lt $MaxDots; $i++) {
  { [int] $_.Split('.')[$i] }.GetNewClosure()
}

$Arr | Sort-Object $sbl
Sage Pourpre
  • 9,932
  • 3
  • 27
  • 39
  • Aha, you beat me to it. I'm trying to think if there's a way to calculate `$MaxDots` in a `ForEach-Object` so it can be done as part of the pipeline, and then pass the result to `Sort-Object` in a pipeline variable. But then I suppose that wouldn't help with constructing the needed number of `[ScriptBlock]`s ahead of time. – Lance U. Matthews Feb 23 '22 at 07:38
  • @LanceU.Matthews Yeah. I guess your only option is to ditch the `sort-object` completely and do a `Foreach`, implement your own sorting logic where you rearrange the records in the array yourself. You'd do only 1 pass... But it is starting to get a bit too much involved :') – Sage Pourpre Feb 23 '22 at 08:06
  • Love this answer, very clever. You could also use `{ [int] $_.Split('.')[$i] }.GetNewClosure()` I believe, it might be more readable – Santiago Squarzon Feb 23 '22 at 15:49
  • 1
    @SantiagoSquarzon Definitely. Thanks. I like it. I updated my answer accordingly. There's a bit of blur in my mind regarding the `GetNewClosure`. I'll read up and do some tests to master it. (eg: I understand that it bind external variables used in the scriptblock to the scriptblock itself so a fresh value is obtained when iterating but it clearly did not try to bind `$_` otherwise it would have created issues so I'm pondering whether it just ignore that one, all automatic variables or all undefined variables at "new closure time". *Testing time* ... And thanks again for the improvement :) – Sage Pourpre Feb 23 '22 at 16:19
  • My pleasure @SagePourpre :) I believe it's because `$_` doesn't exist at the moment of closure hence ignored but since `$i` is defined then the value is locked in each sblock and the sblock "remembers" that value – Santiago Squarzon Feb 23 '22 at 16:24
3

Here's a couple of solutions with some caveats.

The -Property parameter of Sort-Object accepts an array, so you can specify "sort by...then by...". If you know the maximum number of "sub-indices" is 2 (i.e. x.y.z) then you can break the string apart into components separated by . and then sort successively by each component as an integer. It's repetitive, but it works...

(
    @('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10") |
        Sort-Object -Property {
            # Sort by the first component
            return [Int32] $_.Split('.')[0]
        }, {
            # Sort by the second component
            return [Int32] $_.Split('.')[1]
        }, {
            # Sort by the third component
            return [Int32] $_.Split('.')[2]
        }
) -join ', '

If a component is not specified (e.g. '1.2'.Split('.')[2]) then, helpfully, it becomes 0 when casting to an [Int32].

Here's an alternative that only uses one [ScriptBlock] but it also requires that the maximum length of a sub-index in digits is known, too...

$maxComponentCount = 3
$maxComponentDigits = 2

# Create a string of repeated '0's $maxComponentDigits long
$emptyComponentText = [String]::new([Char] '0', $maxComponentDigits)

(
    @('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10") |
        Sort-Object -Property {
            $components = [String[]] (
                $_.Split('.').ForEach(
                    # Pad each component to $maxComponentDigits digits
                    { $_.PadLeft($maxComponentDigits, '0') }
                )
            );
            
            if ($components.Length -lt $maxComponentCount)
            {
                # Pad $components up to $maxComponentCount with $emptyComponentText elements
                $components += ,$emptyComponentText * ($maxComponentCount - $components.Length)
            }

            # Join components - now $maxComponentCount elements of $maxComponentDigits digits - back into an index string
            return $components -join '.'
        }
) -join ', '

That is padding each input index to have the same number of sub-indices and each sub-index to have the same number of digits and then relying on lexical sorting to put them in the correct order, so this...

@('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10")

...gets sorted as if it looked like this...

@('01.00.00', '02.00.00', '03.00.00', '01.01.01', '01.99.00', '02.05.00', '05.05.00', "10.00.00")

I suppose you could set both $maxComponentCount and $maxComponentDigits to, say, 100 if neither value is known, but that feels like a clunky workaround (with performance implications, too). I'll try to think of something better.

Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68
3
'1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10" |Sort-Object { '{0:000}{1:000}{2:000}{3:000}' -f $([int[]]("$_.0.0.0".Split('.'))) }
iRon
  • 20,463
  • 10
  • 53
  • 79
2

Absolute sort keys vs. relative sort direction

The problem we're bumping into is that each [ScriptBlock] passed to Sort-Object only receives a single input value at a time and returns a key value on which to sort, but with each index having a variable number of sub-indices we can't predict to how many levels we need to compare when there's values we haven't yet seen. What we need is a way to define how two values should be sorted relative to each other.

Fortunately, .NET, upon which (Windows) PowerShell is based, defines the [IComparer[]] interface that is used by various methods to determine the sort order of values. Its lone method, Compare(), is passed two values and returns whether they are equal or, otherwise, which one comes before the other. All we have to do is provide our own (simple) implementation of [IComparer[]] and then we can pass it to a built-in sorting method.

[IComparer[]] implementation, usage, and test output

[HierarchicalIndexComparer]: a .NET [IComparer[]] for index strings

# Source: https://stackoverflow.com/a/71237565/150605
class HierarchicalIndexComparer : System.Collections.Generic.Comparer[String]
{
    <#
        Implements Comparer[String].Compare()
        See https://learn.microsoft.com/dotnet/api/system.collections.generic.comparer-1.compare
            When $x -lt $y, returns negative integer
            When $x -eq $y, returns 0
            When $x -gt $y, returns positive integer
    #>
    [Int32] Compare([String] $x, [String] $y)
    {
        # Split each index into components converted to integers with validation
        [Int32[]] $xValues = [HierarchicalIndexComparer]::GetComponentValues($x)
        [Int32[]] $yValues = [HierarchicalIndexComparer]::GetComponentValues($y)
        [Int32] $componentsToCompare = [Math]::Min($xValues.Length, $yValues.Length)

        for ($i = 0; $i -lt $componentsToCompare; $i++)
        {
            [Int32] $componentCompareResult = $xValues[$i].CompareTo($yValues[$i])

            # Sort $x and $y by the current component values if they are not equal
            if ($componentCompareResult -ne 0)
            {
                return $componentCompareResult
            }

            # Otherwise, continue with the next component
        }

        # The first $componentsToCompare elements of $x and $y are equal
        # Sort $x and $y by their component count
        return $xValues.Length.CompareTo($yValues.Length)
    }

    hidden static [Int32[]] GetComponentValues([String] $index)
    {
        return [Int32[]] (
            $index.Split('.').ForEach(
                {
                    if ($_.Length -lt 1)
                    {
                        throw "Index string ""$index"" contains an empty sub-index."
                    }

                    [Int32] $value = -1

                    # Leading zeroes will be removed by parsing and not considered when comparing components
                    if (-not [Int32]::TryParse($_, [System.Globalization.NumberStyles]::None, [System.Globalization.CultureInfo]::InvariantCulture, [Ref] $value))
                    {
                        throw "Sub-index ""$_"" of string ""$index"" contains non-digit characters."
                    }

                    return $value
                }
            )
        )
    }
}

You can see I used a PowerShell class to provide an implementation of the [IComparer[]] interface for sorting index strings; it derives from the recommended [Comparer[]] class. See the last section of this answer for an explanation of the logic used by [HierarchicalIndexComparer].

Once defined the class can then be referenced as...

[HierarchicalIndexComparer]

...and instantiated with...

New-Object -TypeName 'HierarchicalIndexComparer'

...or...

[HierarchicalIndexComparer]::new()

Sorting test index strings with a [HierarchicalIndexComparer]

Now that we have our own [IComparer[]] implementation we can pass an instance of it to a sorting method so it's able to sort index strings. LINQ is a .NET technology that allows you to perform operations on sequences in much the same way as the Group-Object, Select-Object, and Where-Object cmdlets do with PowerShell pipelines. LINQ provides two methods, OrderBy() and OrderByDescending(), to perform primary sorting of sequences. After defining some test index strings we just need to pass them and our comparer to one of those methods to get sorted output...

[String[]] $initial = 1, 10, 100 |
    ForEach-Object -Process { "$_", "$_.0", "$_.0.0", "$_.0.0.0", "$_.0.1", "$_.1", "$_.1.0", "$_.1.1" }
# Shuffle the elements into a "random" order that is the same between runs
[String[]] $shuffled = Get-Random -Count $initial.Length -InputObject $initial -SetSeed 12345
[String[]] $sorted = [System.Linq.Enumerable]::OrderBy(
    $shuffled,                                         # The values to be ordered
    [Func[String, String]] {                           # Selects the key by which to order each value
        param($value)

        return $value                                  # Return the value as its own key
    },
    (New-Object -TypeName 'HierarchicalIndexComparer') # The comparer to perform the ordering
) | ForEach-Object -Process {
    # Just for demonstration purposes to show that further pipeline elements can be used after sorting
    return $_
}

for ($index = 0; $index -lt $initial.Length; $index++)
{
    [PSCustomObject] @{
        '#'         = $index
        '$initial'  = $initial[$index]
        '$shuffled' = $shuffled[$index]
        '$sorted'   = $sorted[$index]
    }
}

The advantage of using [Enumerable]::OrderBy() with a custom [IComparer[]] is that it will provide proper sorting with one pass through your index strings, plus you can pipe the output into subsequent cmdlets.

Briefly, the three parameters being passed to OrderBy() are...

  1. The sequence of index strings to be sorted
  2. The "key" on which each index is sorted
    • We want each index string to be sorted on the string itself (since [HierarchicalIndexComparer] determines the ordering of index [String] instances), so we just return the same value that was passed to us; this is similar to ... | Sort-Object -Property { $_ }
    • See Can LINQ be used in PowerShell? for more information on what's going on here
  3. An instance of our custom index comparer

You can also use the Array::Sort() static method, although that sorts the array passed to it in-place and doesn't return anything for a pipeline to process. Here I'll create a copy of the array first so it can be sorted separately...

# Create a copy of the $shuffled array named $sorted
$sorted = New-Object -TypeName 'System.String[]' -ArgumentList $shuffled.Length
[Array]::Copy($shuffled, $sorted, $shuffled.Length)

[Array]::Sort(
    $sorted,                                                      # The array to be sorted
    (New-Object -TypeName 'SO71232189.HierarchicalIndexComparer') # The comparer with which to sort it
)

Sorted result of test index strings using [HierarchicalIndexComparer]

The above script produces three arrays showing the different transformations of the index collection...

# $initial $shuffled $sorted
0 "1" "100.1" "1"
1 "1.0" "10.0.0.0" "1.0"
2 "1.0.0" "100.0.1" "1.0.0"
3 "1.0.0.0" "1.0.0.0" "1.0.0.0"
4 "1.0.1" "100.0" "1.0.1"
5 "1.1" "10.0.0" "1.1"
6 "1.1.0" "1.1" "1.1.0"
7 "1.1.1" "100.1.1" "1.1.1"
8 "10" "1.0" "10"
9 "10.0" "100.0.0" "10.0"
10 "10.0.0" "1" "10.0.0"
11 "10.0.0.0" "100.1.0" "10.0.0.0"
12 "10.0.1" "10.1.0" "10.0.1"
13 "10.1" "10.0" "10.1"
14 "10.1.0" "10.1.1" "10.1.0"
15 "10.1.1" "100.0.0.0" "10.1.1"
16 "100" "10.1" "100"
17 "100.0" "1.1.1" "100.0"
18 "100.0.0" "1.1.0" "100.0.0"
19 "100.0.0.0" "10" "100.0.0.0"
20 "100.0.1" "1.0.1" "100.0.1"
21 "100.1" "10.0.1" "100.1"
22 "100.1.0" "1.0.0" "100.1.0"
23 "100.1.1" "100" "100.1.1"

Additional notes

[HierarchicalIndexComparer] ordering logic

In short, the Compare() method...

[Int32] Compare([String] $x, [String] $y)
{
    # ...
}

...is splitting each index ($x and $y) into [Int32] components, comparing as many components as they have in common, sorting based on any components that are unequal, and if all common components are equal then sorting the shorter index first.

So, for example, when comparing "1.2.3" to "1.2" it breaks them up into "1", "2", "3" and "1", "2", respectively, and then performs the following comparisons:

Component $x $y Comparison result
0 1 1 Equal
1 2 2 Equal

Since all 2 components of $y equal the first 2 components of $x, it now falls to the index with less components sorting first; $x has 3 components whereas $y has 2, therefore $x sorts after $y, which is correct: "1.2.3" > "1.2".

When comparing "1.2.3" to "1.20.4" it breaks them up into "1", "2", "3" and "1", "20", "4", respectively, and then performs the following comparisons:

Component $x $y Comparison result
0 1 1 Equal
1 2 20 $x component is less than $y component
2 3 4 Not evaluated

Since when comparing the first 3 components of $x and $y we find that component 1 of each is unequal, we return the result of that comparison; $x[1] < $y[1] therefore $x < $y, which is correct: "1.2.3 < 1.20.4".

Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68