2

I suddenly can’t seem to split a string, which I’ve done successfully before. The string is a list of exclusions for Get-ChildItem, which I want to display in a more readable format so the user can easily verify they’re correct.

I’m using PowerShell 7.3.4. The code:

# Regular expression of files/folders to exclude from the search
   $exclusions =  "\.bak| - Copy|\.db|\.ini|\.log|\.temp|template|test|\.tmp|New-Folder|prototype|_vti"
   $substrings = $exclusions -split "|"
   $substrings
   $count      = $substrings.count
   Write-Host "Number of exclusions: $count"
   foreach ($string in $substrings) {
         Write-Host "► $_"
   }

The only output correctly says “Number of exclusions: 12,” but never shows them.

The backslashes tell the regular expression parser to treat the periods literally. Could the backslashes be interfering with the -Split call?

aksarben
  • 588
  • 3
  • 7

2 Answers2

3

Two problems:

  • You need to escape | in the regex pattern, eg. ... -split '\|'
  • You're using $string as an iterator variable, but reference $_ in the loop

Fix both problems and it'll work as expected:

$exclusions =  "\.bak| - Copy|\.db|\.ini|\.log|\.temp|template|test|\.tmp|New-Folder|prototype|_vti"
$substrings = $exclusions -split "\|"
$count      = $substrings.count
Write-Host "Number of exclusions: $count"
foreach ($string in $substrings) {
    Write-Host "► $string"
}

Output:

Number of exclusions: 12
► \.bak
►  - Copy
► \.db
► \.ini
► \.log
► \.temp
► template
► test
► \.tmp
► New-Folder
► prototype
► _vti
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
0

To complement Mathias R. Jessen's effective solution with more detailed background information (focusing just on the -split operation):

By default, PowerShell's -split operator interprets its separator operand as a regex (regular expression).

Therefore, regex metacharacters such as | that you want to use verbatim must be escaped:

  • Either: metacharacter-individually with \:

    'a|b|c' -split '\|'  # -> @('a', 'b', 'c')
    
  • Or - in order to treat an entire string / unknown character verbatim - with [regex]::Replace():

    'a|b|c' -split [regex]::Escape('|')
    

Alternatively, use the (rarely used) SimpleMatch option with -split:

  • Somewhat awkwardly, you then also need to specify the normally optionally second operand that specifies how many tokens to limit the results to; 0 signifies the default behavior of returning all tokens.

    'a|b|c' -split '|', 0, 'SimpleMatch'
    

Another alternative is to use the [string] type's .Split() .NET method, which invariably treats its separator character(s) / strings verbatim, and therefore doesn't require escaping:

# See caveat below re using (multi-character) *strings* as separators.
'a|b|c'.Split('|')
  • In the case at hand that is arguably the simplest solution; however, there are caveats:

    • The separator is invariably matched case-sensitively, whereas -split is case-insensitive by default (as all PowerShell operators are), with the -csplit variant offering case-sensitivity when needed.

    • In order to split by a string as opposed to a single character or by any one from an array of characters, you must use the following non-obvious form in Windows PowerShell:

      # !! The [string[]] cast and the 0 argument are necessary
      # !! in Windows PowerShell, because .Split('||') would be 
      # !! the same as .Split([char[]] '||'), and therefore split by *each* '|'
      'a||b||c'.Split([string[]] '||', 0)
      
      • A new method overload in PowerShell (Core) 7+ makes this no longer necessary, but this change in behavior - outside of PowerShell's control - illustrates a pitfall with the long-term stability of .NET method calls from PowerShell: unless you precisely type all arguments to match the method signature, PowerShell's automatic type conversions can end up selecting a different method overload as new overloads are added by .NET.
    • This answer juxtaposes -split and .Split() in detail, and makes the case to routinely prefer -split, except in rare cases where performance dictates otherwise.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Does anyone know how to prevent the forum software from interpreting an Enter key as if I hit the "Add Comment" Key? I can't enter more than a single paragraph now, which makes things very hard to read. – aksarben May 27 '23 at 18:01
  • @aksarben, comments by design don't support line breaks - the next comment provides a summary of the limitations: – mklement0 May 27 '23 at 18:03
  • As for formatting in comments: see https://stackoverflow.com/help/formatting - in short: you can only use _inline_ code formatting (`\`...\``) with no support for line breaks. Unlike in posts, there must be _no whitespace_ after the opening and before the closing delimiter. _Multi_-`\`` delimiters for support of embedded `\`` chars. are supported (e.g. `\`\`"\`n"\`\``). Unlike in posts, ``\`` can alternatively be used to escape `\``, which, due to the no-whitespace requirement, is a must if you want to produce something like `\`enclosed\``. To produce a formatted ``\`` alone, use `\`\`\\`\`` – mklement0 May 27 '23 at 18:03