1

This question is related to:
Unquoted tokens in argument mode involving variable references and subexpressions: why are they sometimes split into multiple arguments?.

I got this Remove quotes in HashTables Keys when possible request for my ConvertTo-Expression project.

The point is that it is not fully clear to me when keys should actually be quoted in hash tables.
As with argument values, the use of unquoted hash table keys is limited to certain characters.
Several characters (including spaces) are not allowed, e.g.:

$Options = @{
    Margin    = 2
    Padding   = 2
    Font-Size = 24
}

Will cause an error:

Line |
   4 |      Font-Size = 24
     |          ~
     | Missing '=' operator after key in hash literal.

In some cases, just the order of the characters could lead to errors or even pitfalls, e.g.:

$Hashtable = @{ 
    U2 = 'YouTo'
    2U = 'ToYou'
}
$Hashtable.keys
2
U2

(This is because the U2 key will interpreted as a [UInt32] type, meaning that $HashTable.2U will correctly reveal the value but $HashTable.2, $HashTable.'2' and $HashTable.'2U' not.)

Apart from the question that I am looking for some documented best practices for this, I would like to safely test whether a string needs to be quoted or not, something like:

IsStringConstant 'Margin'    # True
IsStringConstant 'Font-Size' # False
IsStringConstant 'U2'        # True
IsStringConstant '2U'        # False

I have been playing with AST, but that requires me to build a ScriptBlock first, which is considered unsafe.

Is there a way to check whether string requires to be quoted for a hash table key?

iRon
  • 20,463
  • 10
  • 53
  • 79

2 Answers2

3

I have been playing with AST, but that requires me to build a ScriptBlock first, which is considered unsafe.

This is (fortunately) not true - you can produce an AST from source code without compiling an enclosing [scriptblock] by calling Parser.ParseInput():

$tokens = @()
$errors = @()
$AST = [System.Management.Automation.Language.Parser]::ParseInput('U2',[ref]$tokens,[ref]$null)

In this case you don't actually need to inspect an AST, you can ascertain whether a hashtable member key is valid based in the tokens produced by parsing a dummy hashtable literal:

function Test-IsValidHashtableStringLiteral
{
    param([string]$Identifier)

    $dummyTable = '@{{{0}=$()}}' -f $Identifier

    $tokens = @()
    $errors = @()
    $null = [System.Management.Automation.Language.Parser]::ParseInput($dummyTable, [ref]$tokens, [ref]$errors)

    if($errors.Count){
        # Parsing our dummy table resulted in errors, no good
        # This would be the case for `Font-Size`
        return $false
    }

    # No errors, let's ensure the member name is recognized 
    # as an identifier and not a value literal (like 2u in PowerShell >6)
    $memberName = $tokens[1]

    return $memberName.Kind -eq 'Identifier'
}

Please note: the result is version-specific - in other words, if you use the above technique and run ConvertTo-Expression with your 2u example on PowerShell 5.1, the resulting expression will only work correctly in 5.1 - to produce a 7.0-compatible expression you'd need to run it on 7.0.

Good news is that the parser interface has remained stable since PowerShell 3.0, and the language infrastructure is very backwards compatible, so this approach will work on all versions from 3.0 through 7.x

Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
2

Rules and Guidance for use of unquoted keys in hash-table literals:

  • If a key is a syntactically valid number literal, it is used as a number - including with suffixes such as u (U) for [uint32] and binary multiplier suffixes such as kb, as well as hex. numbers (e.g., 0x10) or numbers in exponential notation (e.g., 1e2)[1] - this explains the 2U key turning into a numeric key with value 2 of type System.UInt32 in your example.

  • If a key starts with a digit, but is otherwise not a syntactically valid number literal, the hashtable definition fails.

    • Note: digit refers not just to the Latin decimal digits, but to all digits in the DecimalDigitNumber (Nd) Unicode category, which includes characters such as (THAI DIGIT ZERO, U+0E50)
  • Otherwise, it is interpreted as a string, but parsing only succeeds if the token is limited to a sequence of the following characters (these seem to be the same that can be used in variable names without needing to enclose the name in {...} - see the about_Variables help topic's Variable Names that Include Special Characters section):


To put it in terms of guidance:

Quote your hash-table keys, if:

  • they should be strings but happen to start with a digit.

  • they are strings that contain whitespace or symbols other than _


Simple programmatic test if a given key can be used as an unquoted string literal:

  • The following is a simpler alternative to the parser-based approach shown in Mathias R. Jessen's answer:

  • As in Mathias' answer, the test below indicates whether the given key can be used unquoted to become a string key (rather than a numeric key with a token such as 2u).

  • iRon (the OP) himself came up with the approach, which is based on the rules in the previous section.

PS> '2U', '1a', 'Font-Size', 'Margin' | ForEach-Object {
      # Outputs $true, if the key does NOT need quoting and would 
      # become a *string* key.
      $_ -cmatch '^[\p{L}\p{Lt}\p{Lm}\p{Lo}_][\p{L}\p{Lt}\p{Lm}\p{Lo}\p{Nd}_]*$'
    }
False  # starts with digit; would work unquoted, but not as a *string* key
False  # starts with digit; not a valid number literal -> wouldn't work at all
False  # contains '-'
True   # OK
  • Character-class expression \p{L} covers both \p{Lu} and \p{Ll}, i.e. both upper- and lowercase letters.

  • -match, the case-insensitive counterpart to the case-sensitive -cmatch variant, would work equally here; the idea is to potentially speed up the regex matching with -cmatch, though this may not matter in practice.


[1] Caveat: Given that hash-table literals are also used in the construction of [pscustomobject] literals using the [pscustomobject] @{ ... } syntactic sugar (available in PSv3+), PowerShell has to convert numeric keys to strings, because property names can only be strings; this can lead to surprising behavior; e.g.,
[pscustomobject] @{ 0xa = 'ten' } constructs an object with property name '10', which is the string representation of the number that number literal 0xa represents.

mklement0
  • 382,024
  • 64
  • 607
  • 775