7

I'm running

'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object

Shouldn't I have gotten a return of

s-a
S-tst
s-zaa
s-zf
srst2
ssrst

but instead I get the following:

s-a
srst2
ssrst
S-tst
s-zaa
s-zf

How is this possible ? Does sort-object only look at letters when sorting out ? Is there any way to sort it out by special characters ?

Armali
  • 18,255
  • 14
  • 57
  • 171
cdtekcfc
  • 71
  • 2
  • 4
  • Interesting spot. Looks like MS Office has the same quirk; though it's by design. http://windowssecrets.com/forums/showthread.php/156283-Have-I-found-a-sorting-error-in-Excel – JohnLBevan Jun 24 '17 at 00:14
  • May be related: https://stackoverflow.com/questions/25734016/powershell-sort-of-strings-with-underscores ? – JohnLBevan Jun 24 '17 at 00:23
  • Sourcecode: https://github.com/PowerShell/PowerShell/blob/a291834f39a3de45b78e38153b3bafdf45f081a6/src/Microsoft.PowerShell.Commands.Utility/commands/utility/sort-object.cs – JohnLBevan Jun 24 '17 at 00:29
  • 1
    MS uses a `Word Sort` algorithm. https://msdn.microsoft.com/en-us/library/windows/desktop/dd318144(v=vs.85).aspx – JohnLBevan Jun 24 '17 at 01:13
  • Suggested improvement: https://github.com/PowerShell/PowerShell/issues/4098 – JohnLBevan Jun 24 '17 at 01:32
  • 2
    I don't think this is a duplicate (although related). The other post is asking how to sort by Ascii order. This is asking how to disable the work sorting. – Martin Bonner supports Monica Aug 31 '18 at 07:24

2 Answers2

5

This behaviour is by design, but not always what people want/expect. If you want strings sorted with each character in ASCII order use this:

Add-Type @"
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Globalization;

    public class SimpleStringComparer: IComparer, IComparer<string>
    {

        private static readonly CompareInfo compareInfo = CompareInfo.GetCompareInfo(CultureInfo.InvariantCulture.Name);

        public int Compare(object x, object y)
        {
            return Compare(x as string, y as string);
        }
        public int Compare(string x, string y)
        {
            return compareInfo.Compare(x, y, CompareOptions.OrdinalIgnoreCase);
        }
        public SimpleStringComparer() {}
    }
"@


[string[]]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'

[System.Collections.Generic.List[string]]$list = [System.Collections.Generic.List[string]]::new()
$list.AddRange($myList)
[SimpleStringComparer]$comparer = [SimpleStringComparer]::new()
$list.Sort([SimpleStringComparer]::new())
$list

Outputs:

s'a
S'a
s'a1
S'a1
s-a
S-a
s-a1
S-a1
sa
Sa
sa1
Sa1
s^a
S^a

More Info

Per @TessellatingHeckler in the comments, you can sort strings in character code (ordinal) order by casting the string to a char array. However, that still handles hyphens and apostrophes in a potentially unexpected way (as these characters are ignored):

$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'
$myList | Sort-Object -Property { [char[]] $_ }
s'a
S'a
s-a
S-a
s'a1
S'a1
s-a1
S-a1
s^a
S^a
sa
Sa
sa1
Sa1

The current sorting behaviour is by design. It appears that PowerShell implements a "Word Sort". This is documented here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd318144(v=vs.85).aspx#SortingFunctions

In addition to ignoring hyphens and apostrophes (except when comparing otherwise identical strings), this sort also treats punctuation characters as coming before alphanumerics, and handles accented letters alongside their counterparts. A simple demo of this can be seen like so:

32..255 | %{[string][char][byte]$_} | sort

To define other sorting behaviours, currently you'd likely need to dip into .Net, like so:

Add-Type @"
    using System;
    using System.Runtime.InteropServices;
    using System.Collections;
    public class NumericStringComparer: IComparer
    {
        //https://msdn.microsoft.com/en-us/library/windows/desktop/bb759947%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
        [DllImport("shlwapi.dll")]
        public static extern int StrCmpLogicalW(string psz1, string psz2);
        public int Compare(object x, object y)
        {
            return Compare(x as string, y as string);
        }
        public int Compare(string x, string y)
        {
            return StrCmpLogicalW(x, y);
        }
        public NumericStringComparer() {}
    }
"@

[System.Collections.ArrayList]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a', , '100a','1a','001a','2a','20a'
$myList.Sort([NumericStringComparer]::new())
$myList -join ', '

The above sorts strings the way Windows Explorer would (i.e. treating leading digits as numeric values):

s'a, s'a1, S'a, s-a, S-a, S-a1, S'a1, s-a1, S^a, s^a, 1a, 001a, 2a, Sa, Sa1, sa, sa1, 20a, 100a

I've submitted a feature suggestion to provide more PS friendly solutions on Sort-Object. See https://github.com/PowerShell/PowerShell/issues/4098

JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
  • 3
    Here is a [Jon Skeet explanation](https://stackoverflow.com/questions/21886555/unexpected-behavior-when-sorting-strings-with-letters-and-dashes) of the sorting behavior. As far as I can tell, `Sort-Object` accepts a `-Culture` parameter, but there is no culture I can find with an Ordinal sort, and creating a new custom culture requires admin rights and registering it system-wide before it can be used, so that leaves PS a bit stuck. – TessellatingHeckler Jun 24 '17 at 02:29
  • 4
    `$a = [System.Collections.ArrayList]@('srs', 's-a', 's-z'); $a.Sort([System.StringComparer]::Ordinal)` - from https://stackoverflow.com/q/18543842/478656 (possibly makes this Q a duplicate) – TessellatingHeckler Jun 24 '17 at 02:37
  • Can you clarify that `Sort-Object -Property { [char[]] $_ }` is not a fix, but a demonstration of the problem. – Martin Bonner supports Monica Aug 31 '18 at 07:32
0

You can achieve ASCII-style order by sorting string hex representation:

'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object {Format-Hex -InputObject $_}

In case you need it case insensitive you can lowercase is first:

'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object {Format-Hex -InputObject $_.ToLower()}
kletnoe
  • 397
  • 3
  • 18