This is a very simple example
$Test = @('ae','æ')
$Test | Select-Object -Unique
The output
ae
What is going on here and how can I avoid it. Obviously I do not want "ae" to be equal to "æ"
This is a very simple example
$Test = @('ae','æ')
$Test | Select-Object -Unique
The output
ae
What is going on here and how can I avoid it. Obviously I do not want "ae" to be equal to "æ"
As mentioned in the comments, your current culture settings identify ae
and æ
as equal, so it's only returning the first one in the input array.
If you reverse the order you'll get æ
instead:
$Test = @('æ','ae')
$Test | Select-Object -Unique
# æ
You can check which culture PowerShell is using with this:
PS> Get-Culture
LCID Name DisplayName
---- ---- -----------
2057 en-GB English (United Kingdom)
Although note that per @mklement0's comment, PowerShell doesn't use this culture consistently for everything...
Turns out that the current culture indeed applies to Select-Object -Unique (which is currently unexpectedly also (invariably) case-sensitive). It seems that PowerShell has a split personality with respect to culture invariance: [string] casts, string interpolation and string-relevant operators (except >) use the invariant culture, whereas cmdlets use the current one.
In any case, rather than a culture-aware comparison, it sounds like what you're after is an "ordinal" comparison - for more details see Ordinal String Operations:
Ordinal comparisons are string comparisons in which each byte of each string is compared without linguistic interpretation; for example, "windows" does not match "Windows".
(And by extension, ae
, does not equal æ
)
I can't find an idiomatic way to do that in PowerShell (you can change culture with Set-Culture
, but all the ones I tried still treat ae
equal to æ
), but if you want more control over how values are compared, you could drop down into Linq like this:
PS> $data = @( "ae", "æ" )
PS> [System.Linq.Enumerable]::Distinct([string[]]$data, [System.StringComparer]::Ordinal )
ae
æ
You've then got a whole bunch of different way to compare strings:
https://learn.microsoft.com/en-us/dotnet/api/system.stringcomparer?view=net-6.0#properties
CurrentCulture - Gets a StringComparer object that performs a case-sensitive string comparison using the word comparison rules of the current culture.
CurrentCultureIgnoreCase - Gets a StringComparer object that performs case-insensitive string comparisons using the word comparison rules of the current culture.
InvariantCulture - Gets a StringComparer object that performs a case-sensitive string comparison using the word comparison rules of the invariant culture.
InvariantCultureIgnoreCase - Gets a StringComparer object that performs a case-insensitive string comparison using the word comparison rules of the invariant culture.
Ordinal - Gets a StringComparer object that performs a case-sensitive ordinal string comparison.
OrdinalIgnoreCase - Gets a StringComparer object that performs a case-insensitive ordinal string comparison.
and you can even implement your own:
class FirstLetterComparer : System.Collections.Generic.IEqualityComparer[string] {
[bool]Equals([string]$x, [string]$y) { return $x[0] -eq $y[0]; }
[int]GetHashCode([string] $x) { return $x[0].GetHashCode(); }
}
# returns the first item in the list that starts with each distinct character.
# note that "abb" is omitted because it starts with the same first letter as "aaa"
# so it's not "first letter distinct".
$data = @( "aaa", "abb", "bbb" )
[System.Linq.Enumerable]::Distinct([string[]]$data, [FirstLetterComparer]::new() )
# aaa
# bbb
To add to mclayton's excellent answer, with background information:
While with cmdlets such as Select-Object
PowerShell does indeed use the current culture, there are contexts in which it uses the invariant culture, notably the -eq
/ -ne
operators - see this answer.
PowerShell has two distinct editions, and they differ with respect to the behavior at hand, due to what edition of .NET they're built on:
Windows PowerShell, the legacy, ships-with-Windows edition, whose latest and final version is 5.1, which builds on the legacy, Windows-only .NET Framework, which uses NSL (National Language Support) for culture-specific information.
PowerShell (Core) 7+, which builds on the cross-platform .NET 5+ edition, which now uses the ICU (International Components for Unicode) library by default - though on Windows you can opt-into still using NLS.
[cultureinfo]::CurrentCulture.NumberFormat.NumberDecimalDigits
), in effect the NLS settings are still used; see .NET globalization and ICU and GitHub issue #81853.Read on for details.
æ
is a ligature that is formed from the letters a
and e
.
Windows PowerShell / NLS:
The ligature æ
is considered equivalent to the sequence of its constituent letters in most cultures, except in those:
æ
is in use as a character in its own right ...These exceptions are (only the so-called neutral (non-nation-specific) cultures are listed, not also their national varieties):
Other ligatures have multi-letter equivalents in all cultures, such as œ
vs. oe
; there are also ligatures whose multi-letter equivalent is not the sequence of its constituent letters, but a modern equivalent, e.g., German ß
(which originated from sz
) is considered equivalent to ss
.
PowerShell (Core) 7+ / ICU: