12

The following list does not sort properly (IMHO):

$a = @( 'ABCZ', 'ABC_', 'ABCA' )
$a | sort
ABC_
ABCA
ABCZ

My handy ASCII chart and Unicode C0 Controls and Basic Latin chart have the underscore (low line) with an ordinal of 95 (U+005F). This is a higher number than the capital letters A-Z. Sort should have put the string ending with an underscore last.

Get-Culture is en-US

The next set of commands does what I expect:

$a = @( 'ABCZ', 'ABC_', 'ABCA' )
[System.Collections.ArrayList] $al = $a
$al.Sort( [System.StringComparer]::Ordinal )
$al
ABCA
ABCZ
ABC_

Now I create an ANSI encoded file containing those same 3 strings:

Get-Content -Encoding Byte data.txt
65 66 67 90 13 10  65 66 67 95 13 10  65 66 67 65 13 10
$a = Get-Content data.txt
[System.Collections.ArrayList] $al = $a
$al.Sort( [System.StringComparer]::Ordinal )
$al
ABC_
ABCA
ABCZ

Once more the string containing the underscore/lowline is not sorted correctly. What am I missing?


Edit:

Let's reference this example #4:

'A' -lt '_'
False
[char] 'A' -lt [char] '_'
True

Seems like both statements should be False or both should be True. I'm comparing strings in the first statement, and then comparing the Char type. A string is merely a collection of Char types so I think the two comparison operations should be equivalent.

And now for example #5:

Get-Content -Encoding Byte data.txt
65 66 67 90 13 10  65 66 67 95 13 10  65 66 67 65 13 10
$a = Get-Content data.txt
$b = @( 'ABCZ', 'ABC_', 'ABCA' )
$a[0] -eq $b[0]; $a[1] -eq $b[1]; $a[2] -eq $b[2];
True
True
True
[System.Collections.ArrayList] $al = $a
[System.Collections.ArrayList] $bl = $b
$al[0] -eq $bl[0]; $al[1] -eq $bl[1]; $al[2] -eq $bl[2];
True
True
True
$al.Sort( [System.StringComparer]::Ordinal )
$bl.Sort( [System.StringComparer]::Ordinal )
$al
ABC_
ABCA
ABCZ
$bl
ABCA
ABCZ
ABC_

The two ArrayList contain the same strings, but are sorted differently. Why?

bretth
  • 163
  • 8
  • 2
    I think what you are missing is that you are expecting non-standard responses from Windows. It has always prioritized symbols before letters, just look at the file system. Make files with those names, sort by name, and it will sort them the same way with ABC_ being first. – TheMadTechnician Sep 08 '14 at 23:02
  • 5
    [String sorting is not done by ASCII code any more.](http://blogs.msdn.com/b/oldnewthing/archive/2004/05/18/134051.aspx) – Mike Zboray Sep 09 '14 at 03:59
  • Also as far as I can tell the weirdness with the second part has something to do with `ArrayList`. Using a strongly typed `String.Collections.Generic.List[string]` sorts as expected. Also, using an `string[]` sorts as expected with `Array::Sort`, but `object[]` does not. – Mike Zboray Sep 09 '14 at 04:39
  • You'll also have to confirm what `Get-Content data.txt` actually returns. – Mark Hurd Sep 10 '14 at 03:58
  • Thanks for suggesting the use of `String.Collections.Generic.List[string]` @mikez. It works and if you want to reply I'll accept it as an answer. I just wished I understood the difference between my use of Get-Contents and definiting the array inline (see example #5). – bretth Sep 11 '14 at 21:16
  • Hmmm... I thought it could have been because one of the `ArrayList` instances was actually an `IListWrapper` or some other "hidden" difference (even though I can see the simple creation step), but if you apply `$aa=$al.ToArray` and sort using `[System.Array]::Sort($aa, [System.StringComparer]::Ordinal )` (and equivalently for `$bl -> $ba` then they still sort differently! – Mark Hurd Sep 20 '14 at 08:20
  • There is something "special" about `$a`. Even though each individual element has type `String`, you can't `.CopyTo` a `String` array! – Mark Hurd Sep 20 '14 at 08:39
  • And when you construct a 6 `Object` array and `.CopyTo` it both `$a` and `$b`, `sort` fails to want to compare some items... – Mark Hurd Sep 20 '14 at 09:06
  • 1
    Posted to Microsoft Connect as a bug. See https://connect.microsoft.com/PowerShell/feedbackdetail/view/974422 – bretth Sep 23 '14 at 20:53
  • Seems like adding a switch to allow 'ordinal' sorting to the `sort-object` cmdlet would be the solution. – Jeter-work Nov 04 '15 at 22:20

4 Answers4

2

In many cases PowerShell wrap/unwrap objects in/from PSObject. In most cases it is done transparently, and you does not even notice this, but in your case it is what cause your trouble.

$a='ABCZ', 'ABC_', 'ABCA'
$a|Set-Content data.txt
$b=Get-Content data.txt

[Type]::GetTypeArray($a).FullName
# System.String
# System.String
# System.String
[Type]::GetTypeArray($b).FullName
# System.Management.Automation.PSObject
# System.Management.Automation.PSObject
# System.Management.Automation.PSObject

As you can see, object returned from Get-Content are wrapped in PSObject, that prevent StringComparer from seeing underlying strings and compare them properly. Strongly typed string collecting can not store PSObjects, so PowerShell will unwrap strings to store them in strongly typed collection, that allows StringComparer to see strings and compare them properly.

Edit:

First of all, when you write that $a[1].GetType() or that $b[1].GetType() you does not call .NET methods, but PowerShell methods, which normally call .NET methods on wrapped object. Thus you can not get real type of objects this way. Even more, them can be overridden, consider this code:

$c='String'|Add-Member -Type ScriptMethod -Name GetType -Value {[int]} -Force -PassThru
$c.GetType().FullName
# System.Int32

Let us call .NET methods thru reflection:

$GetType=[Object].GetMethod('GetType')
$GetType.Invoke($c,$null).FullName
# System.String
$GetType.Invoke($a[1],$null).FullName
# System.String
$GetType.Invoke($b[1],$null).FullName
# System.String

Now we get real type for $c, but it says that type of $b[1] is String not PSObject. As I say, in most cases unwrapping done transparently, so you see wrapped String and not PSObject itself. One particular case when it does not happening is that: when you pass array, then array elements are not unwrapped. So, let us add additional level of indirection here:

$Invoke=[Reflection.MethodInfo].GetMethod('Invoke',[Type[]]([Object],[Object[]]))
$Invoke.Invoke($GetType,($a[1],$null)).FullName
# System.String
$Invoke.Invoke($GetType,($b[1],$null)).FullName
# System.Management.Automation.PSObject

Now, as we pass $b[1] as part of array, we can see real type of it: PSObject. Although, I prefer to use [Type]::GetTypeArray instead.

About StringComparer: as you can see, when not both compared objects are strings, then StringComparer rely on IComparable.CompareTo for comparison. And PSObject implement IComparable interface, so that sorting will be done according to PSObject IComparable implementation.

Community
  • 1
  • 1
user4003407
  • 21,204
  • 4
  • 50
  • 60
  • I think your on to something. But the sort does rearrange the PSObject items, just not as I would expect. $a[1].GetType().Name and $b[1].GetType().Name both return "String". Can you point me to documentation with more details on arrays, PSObjects, and how the StringComparer might work when presented PSObjects? Thanks. – bretth Nov 15 '15 at 00:55
  • 1
    @bretth I update my answer. Sorry, I can not point you to good documentation about that. Great part of my PowerShell knowledge obtained thru experimentation and digging with ILSpy. IMHO, PowerShell really lacking documentation about many internal parts. – user4003407 Nov 15 '15 at 05:57
  • This may also be related? https://stackoverflow.com/questions/44731470/powershell-sorting-string-objects-with-a-special-character – JohnLBevan Jun 24 '17 at 00:24
0

Windows uses Unicode, not ASCII, so what you're seeing is the Unicode sort order for en-US. The general rules for sorting are:

  1. numbers, then lowercase and uppercase intermixed
  2. Special characters occur before numbers.

Extending your example,

$a = @( 'ABCZ', 'ABC_', 'ABCA', 'ABC4', 'abca' )

$a | sort-object
ABC_
ABC4
abca
ABCA
ABCZ
Mark Hurd
  • 10,665
  • 10
  • 68
  • 101
Brad Schoening
  • 1,281
  • 6
  • 22
  • But the OP is explicitly asking for `Ordinal` order, and each individual Object in `$a` reports a type of `String`, but they don't fit in a `String` array. So, yes, we're getting the default Unicode ordering on `Object` instead of the requested `Ordinal` ordering. But why? – Mark Hurd Sep 30 '14 at 02:06
  • Unicode string sorting can be seen in practice here: http://minaret.info/test/sort.msp – Brad Schoening Sep 30 '14 at 18:15
0

If you really want to do this.... I will admit it's ugly but it works. I would create a function if this is something you need to do on a regular basis.

$a = @( 'ABCZ', 'ABC_', 'ABCA', 'ab1z' ) $ascii = @()

foreach ($item in $a) { $string = "" for ($i = 0; $i -lt $item.length; $i++) { $char = [int] [char] $item[$i] $string += "$char;" }

$ascii += $string
}

$b = @()

foreach ($item in $ascii | Sort-Object) { $string = "" $array = $item.Split(";") foreach ($char in $array) { $string += [char] [int] $char }

$b += $string
}

$a $b

ABCA ABCZ ABC_

ScubaZip
  • 31
  • 2
-1

I tried the following and the sort is as expected:

[System.Collections.ArrayList] $al = [String[]] $a
Rizier123
  • 58,877
  • 16
  • 101
  • 156
retrolite
  • 1
  • 1