7

I recently answered a SO-question about using -lt or -gt with strings. My answer was based on something I've read earlier which said that -lt compares one char from each string at a time until a ASCII-value is not equal to the other. At that point the result (lower/equal/greater) decides. By that logic, "Less" -lt "less" should return True because L has a lower ASCII-byte-value than l, but it doesn't:

[System.Text.Encoding]::ASCII.GetBytes("Less".ToCharArray())
76
101
115
115

[System.Text.Encoding]::ASCII.GetBytes("less".ToCharArray())
108
101
115
115

"Less" -lt "less"
False

It seems that I may have been missing a crucial piece: the test is case-insensitive

#L has a lower ASCII-value than l. PS doesn't care. They're equal
"Less" -le "less"
True

#The last s has a lower ASCII-value than t. PS cares.
"Less" -lt "lest"
True

#T has a lower ASCII-value than t. PS doesn't care
"LesT" -lt "lest"
False

#Again PS doesn't care. They're equal
"LesT" -le "lest"
True

I then tried to test char vs single-character-string:

[int][char]"L"
76

[int][char]"l"
108


#Using string it's case-insensitive. L = l
"L" -lt "l"
False

"L" -le "l"
True

"L" -gt "l"
False

#Using chars it's case-sensitive! L < l
([char]"L") -lt ([char]"l")
True

([char]"L") -gt ([char]"l")
False

For comparison, I tried to use the case-sensitive less-than operator, but it says L > l which is the opposite of what -lt returned for chars.

"L" -clt "l"
False

"l" -clt "L"
True

How does the comparison work, because it clearly isn't by using ASCII-value and why does it behave differently for chars vs. strings?

Binarus
  • 4,005
  • 3
  • 25
  • 41
Frode F.
  • 52,376
  • 9
  • 98
  • 114
  • 1
    BTW, comparison operators are not only case-insensitive by default (which is clearly documented in [`about_Comparison_Operators`](https://technet.microsoft.com/library/hh847759.aspx) help topic), but also do proper comparison of composite characters in different forms: `'ё' -eq 'Ё'`. – user4003407 Mar 19 '16 at 01:00
  • Good point. The fact that operators are case-insensitive by default was what led me to test that first, but considering that `-lt` doesn't have a case-sensitive operator like `clike`, `cmatch` etc. it wasn't 100% obvious it should be case-insensitive. Actually `-clt`, `-ilt` etc. exists (undocumented) but they return the same as `-lt` as far as I can tell. Guessing they're just aliases. – Frode F. Mar 19 '16 at 01:31
  • *By default, all comparison operators are case-insensitive. To make a comparison operator case-sensitive, precede the operator name with a "c". For example, the case-sensitive version of "-eq" is "-ceq". To make the case-insensitivity explicit, precede the operator with an "i". For example, the explicitly case-insensitive version of "-eq" is "-ieq".* That is from the link from my previous comment. So, `-clt` and `-ilt` are documented. And them also return different results: `'A'-cle'a'` and `'A'-ile'a'`. – user4003407 Mar 19 '16 at 01:45
  • Getting late here I see so I missed that. :-) `"L" -clt "l"` still doesn't work though. – Frode F. Mar 19 '16 at 01:53
  • I did myself a favor and split them into two SO-questions as it got too complex. The `trace-command` part is moved to http://stackoverflow.com/questions/36099167/trace-debug-powershell-operators – Frode F. Mar 19 '16 at 07:38
  • 3
    `System.Char` is just special numeric type. So, it compared as numeric not as string. For example: `'AaBb'.GetEnumerator()|sort -CaseSensitive` return `A`, `B`, `a`, `b`; while `'A','a','B','b'|sort -CaseSensitive` return `a`, `A`, `b`, `B`. And string comparison is not work in char by char basis: `&{$a='A','a','B','b';foreach($b in $a){foreach($c in $a){$b+$c}}}|sort -CaseSensitive` — this place `AA` before `ab`, although `a` placed before `A` when go alone. – user4003407 Mar 19 '16 at 08:05
  • Thanks! That explains why char vs string behaves differently as you confirm that char comparison is done using int-values like I expected. Still wondeing how it compares the strings behind-the-scenes. Do you know? And how I can I find the proof myself? And as usual, I would appreciate if you provided this as an answer. :-) – Frode F. Mar 19 '16 at 08:14
  • I've found that `Sort-Object` (after peeling off 100 layers) uses `[cultureinfo]::CurrentUICulture.CompareInfo.Compare()` which uses `String.Compare()`, but that is not a valid answer as I'm interested in `-lt`/`-gt`. They probably works the same as sort, but I want proof. :-) – Frode F. Mar 19 '16 at 09:26

2 Answers2

6

A big thank-you to PetSerAl for all his invaluable input.

tl; dr:

  • -lt and -gt compare [char] instances numerically by Unicode codepoint.

    • Confusingly, so do -ilt, -clt, -igt, -cgt - even though they only make sense with string operands, but that's a quirk in the PowerShell language itself (see bottom).
  • -eq (and its alias -ieq), by contrast, compare [char] instances case-insensitively, which is typically, but not necessarily like a case-insensitive string comparison (-ceq again compares strictly numerically).

    • -eq/-ieq ultimately also compares numerically, but first converts the operands to their uppercase equivalents using the invariant culture; as a result, this comparison is not fully equivalent to PowerShell's string comparison, which additionally recognizes so-called compatible sequences (distinct characters or even sequences considered to have the same meaning; see Unicode equivalence) as equal.
    • In other words: PowerShell special-cases the behavior of only -eq / -ieq with [char] operands, and does so in a manner that is almost, but not quite the same as case-insensitive string comparison.
  • This distinction leads to counter-intuitive behavior such as [char] 'A' -eq [char] 'a' and [char] 'A' -lt [char] 'a' both returning $true.

  • To be safe:

    • always cast to [int] if you want numeric (Unicode codepoint) comparison.
    • always cast to [string] if you want string comparison.

For background information, read on.


PowerShell's usually helpful operator overloading can be tricky at times.

Note that in a numeric context (whether implicit or explicit), PowerShell treats characters ([char] ([System.Char]) instances) numerically, by their Unicode codepoint (not ASCII).

[char] 'A' -eq 65  # $true, in the 'Basic Latin' Unicode range, which coincides with ASCII
[char] 'Ā' -eq 256 # $true; 0x100, in the 'Latin-1 Supplement' Unicode range

What makes [char] unusual is that its instances are compared to each other numerically as-is, by Unicode codepoint, EXCEPT with -eq/-ieq.

  • ceq, -lt, and -gt compare directly by Unicode codepoints, and - counter-intuitively - so do -ilt, -clt, -igt and -cgt:
[char] 'A' -lt [char] 'a'  # $true; Unicode codepoint 65 ('A') is less than 97 ('a')
  • -eq (and its alias -ieq) first transforms the characters to uppercase, then compares the resulting Unicode codepoints:
[char] 'A' -eq [char] 'a' # !! ALSO $true; equivalent of 65 -eq 65

It's worth reflecting on this Buddhist turn: this and that: in the world of PowerShell, character 'A' is both less than and equal to 'a', depending on how you compare.

Also, directly or indirectly - after transformation to uppercase - comparing Unicode codepoints is NOT the same as comparing them as strings, because PowerShell's string comparison additionally recognizes so-called compatible sequences, where characters (or even character sequences) are considered "the same" if they have the same meaning (see Unicode equivalence); e.g.:

# Distinct Unicode characters U+2126 (Ohm Sign) and U+03A9 Greek Capital Letter Omega)
# ARE recognized as the "same thing" in a *string* comparison:
"Ω" -ceq "Ω"  # $true, despite having distinct Unicode codepoints

# -eq/ieq: with [char], by only applying transformation to uppercase, the results
# are still different codepoints, which - compared numerically - are NOT equal:
[char] 'Ω' -eq [char] 'Ω' # $false: uppercased codepoints differ

# -ceq always applies direct codepoint comparison.
[char] 'Ω' -ceq [char] 'Ω' # $false: codepoints differ

Note that use of prefixes i or c to explicitly specify case-matching behavior is NOT sufficient to force string comparison, even though conceptually operators such as -ceq, -ieq, -clt, -ilt, -cgt, -igt only make sense with strings.

Effectively, the i and c prefixes are simply ignored when applied to -lt and -gt while comparing [char] operands; as it turns out (unlike what I originally thought), this is a general PowerShell pitfall - see below for an explanation.

As an aside: -lt and -gt logic in string comparison is not numeric, but based on collation order (a human-centric way of ordering independent of codepoints / byte values), which in .NET terms is controlled by cultures (either by default by the one currently in effect, or by passing a culture parameter to methods).
As @PetSerAl demonstrates in a comment (and unlike what I originally claimed), PS string comparisons use the invariant culture, not the current culture, so their behavior is the same, irrespective of what culture is the current one.


Behind the scenes:

As @PetserAl explains in the comments, PowerShell's parsing doesn't distinguish between the base form of an operator its i-prefixed form; e.g., both -lt and -ilt are translated to the same value, Ilt.
Thus, Powershell cannot implement differing behavior for -lt vs. -ilt, -gt vs. igt, ..., because it treats them the same at the syntax level.

This leads to somewhat counter-intuitive behavior in that operator prefixes are effectively ignored when comparing data types where case-sensitivity has no meaning - as opposed to getting coerced to strings, as one might expect; e.g.:

"10" -cgt "2"  # $false, because "2" comes after "1" in the collation order

10 -cgt 2  # !! $true; *numeric* comparison still happens; the `c` is ignored.

In the latter case I would have expected the use of -cgt to coerce the operands to strings, given that case-sensitive comparison is only a meaningful concept in string comparison, but that is NOT how it works.

If you want to dig deeper into how PowerShell operates, see @PetSerAl's comments below.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Is there any benefit to references the .NET sources regarding char and string here to correlate how PowerShell is returning resutls? http://referencesource.microsoft.com/#mscorlib/system/char.cs,b0e89607f165d052 and http://referencesource.microsoft.com/#mscorlib/system/string.cs,b42f029a93d6432a – Kory Gill Mar 19 '16 at 19:06
  • @KoryGill: Thanks for the links, but I think the behavior described is really on the PowerShell side. (I'm inferring that from the examples I have given; I have not tried to trace execution or dig into the source code). If something doesn't ring true, do tell me. – mklement0 Mar 19 '16 at 19:12
  • Thanks for a thorough answer. I now understand how the chars compares (int comparison). About the last part using char and `-clt` etc. To verify: you expected the comparison of the chars to behave like strings because `-clt/-ilt` only work with strings, right? Still having some trouble with strings though. Where can I find the "collation order" for my culture? Are strings compared with `-lt/-gt` affected by culture even though it's case-insensitive? `"Āa" -cgt "Aa"` is True. Apparently, accented-A is greater than A. Why? – Frode F. Mar 19 '16 at 19:45
  • @FrodeF.: Yes, I expect `-ilt`, `-clt`, ... to force a string comparison, because that's the only meaningful interpretation. Think of collation order as telephone directory-style sorting: it is independent of codepoints and instead focuses on how _humans_ order (sort) strings, which typically means grouping uppercase and lowercase letters together, as well as grouping letters together that share the same base letter and differ only by diacritics (e.g., "A" and "Ä"). This is a complex topic, and you should probably ask a separate question if this doesn't satisfy you. – mklement0 Mar 19 '16 at 20:11
  • 1
    @FrodeF.: `[System.Globalization.CultureInfo]::CurrentCulture.CompareInfo` and https://msdn.microsoft.com/en-us/library/system.globalization.sortkey.aspx may get you started. – mklement0 Mar 19 '16 at 20:12
  • 1
    @FrodeF. If you think about decompiling some PowerShell code, then good point to start would be `System.Management.Automation.Language.PSBinaryOperationBinder.CompareXX` methods. This methods ends up calling `BinaryComparisonCommon` for numeric comparison, `BinaryEqualityComparison` for non-numeric equality comparison and `BinaryComparision` for non-numeric relative comparison. `BinaryEqualityComparison` have special case for comparing `char`s in case-insensitive manner. There is no special case for `char` in `BinaryComparision`, it just ends up calling `IComparable.CompareTo`. – user4003407 Mar 19 '16 at 22:00
  • 2
    @mklement0 As of PS v5, both `BinaryEqualityComparison` and `BinaryComparision` specifically use `InvariantCulture`, when comparing strings, not `CurrentCulture` or `CurrentUICulture`: `[cultureinfo]::CurrentCulture='tr'; [string]::Equals('i','I','CurrentCultureIgnoreCase'); 'i'-ieq'I'`. Also `System.Management.Automation.Language.TokenKind` enum does not have special values for non-prefixed comparison operators: `{$a-eq$b}.Ast.EndBlock.Statements[0].PipelineElements[0].Expression.Operator` — return `Ieq`, so it would be kind of problematic to provide different behavior `-ilt` vs `-lt`. – user4003407 Mar 19 '16 at 22:16
  • Thanks for great comments as usual! :) Can I ask how you found ex:`System.Management.Automation.Language.PSBinaryOperationBinder.CompareXX` did you just look around/search in libraies or did you use a powershell command/property of some sort (like `ast` etc.) to get you on the right tracK? – Frode F. Mar 20 '16 at 10:39
  • 2
    @FrodeF. I start from `System.Management.Automation.Language.Compiler.VisitBinaryExpression`. From here I got to `PSBinaryOperationBinder` class. Then I inspect base class `BinaryOperationBinder.Bind` method. It call `DynamicMetaObject.BindBinaryOperation` method. Which, if not overridden, call back to `BinaryOperationBinder.FallbackBinaryOperation` method. So, I inspect `PSBinaryOperationBinder.FallbackBinaryOperation`. And here we already have all the `CompareXX` methods. – user4003407 Mar 20 '16 at 11:36
  • @PetSerAl: Thanks for corrections and great background info, I've substantially revised the answer - let me know if something still looks off. – mklement0 Mar 20 '16 at 14:21
  • 1
    @mklement0 Yes `-ceq` applied to `[char]` operands do numeric comparison, but it not always yield same results as in string semantic: `'Ω' -ceq 'Ω'` vs. `[char]'Ω' -ceq [char]'Ω'`. (`U+2126` *Ohm Sign* vs. `U+03A9` *Greek Capital Letter Omega*) – user4003407 Mar 20 '16 at 15:15
  • Thanks again, @PetSerAl - answer updated. Now that the fog has cleared in my head, I conclude: (a) not forcing string semantics with `i` and `c`-prefixed operators is counter-intuitive (though I now understand why that's not an option), and (b) special-casing `-eq` / `-eq` for `[char]` was an unfortunate design decision. I sincerely appreciate your ability to illustrate your points with succinct code snippets. – mklement0 Mar 20 '16 at 16:54
  • 1
    `-ieq` also use numeric comparison when comparing `char`s, but it apply `[char]::ToUpperInvariant` to both operands to do it in case-insensitive manner. You can use the same *Ohm/Omega* snippet to see that. So, in terms of `[System.StringComparison]` you can say, when comparing `char`s, `-ieq` and `-ine` use `OrdinalIgnoreCase` comparison, while all other operators use `Ordinal` comparison; when comparing strings `i` prefixed operators use `InvariantCultureIgnoreCase`, while `c` prefixed use `InvariantCulture` comparison; when comparing something else prefix ignored/not used. – user4003407 Mar 20 '16 at 22:02
2

Not quite sure what to post here other than the comparisons are all correct when dealing with strings/characters. If you want an Ordinal comparison, do an Ordinal comparison and you get results based on that.

Best Practices for Using Strings in the .NET Framework

[string]::Compare('L','l')
returns 1

and

[string]::Compare("L","l", [stringcomparison]::Ordinal)
returns -32

Not sure what to add here to help clarify.

Also see: Upper vs Lower Case

Community
  • 1
  • 1
Kory Gill
  • 6,993
  • 1
  • 25
  • 33
  • Thanks for the answer. See updated question as I cleaned it up a bit. I don't want to specify the comparison method, I just want to understand how `-lt`/`-gt` works by default, because `"L" -lt "l"` (False because equal) does not return the same as `([char]"L") -lt ([char]"l")` (True). If operators by default are case-insensitive, shouldn't both return `False` because they are equal? – Frode F. Mar 19 '16 at 07:47