1

I have a script that works perfectly fine with Powershell 5.x, but does not work anymore on Powershell Core (7.2.1)

The problem happens when I try to split a text (copy&past from an email)..

It all comes down to this part of the code:

$test="blue
green
yellow
"

#$test.Split([Environment]::NewLine)

$x = $test.Split([Environment]::NewLine)

$x[0]
$x[1]

In Powershell 5 the value for $x[0]==blue and $x[1]==green But in Powershell Core the split doesn't do anything and $x[1] is "non existent".

In Powershell 7 the line breaks are handled differently (that's at least what I assume), but I couldn't find a solution to it..

I tried it with changing the code to $rows = $path.split([Environment]::NewLine) and $path.Split([System.Environment]::NewLine, [System.StringSplitOptions]::RemoveEmptyEntries) but that doesn't change anything..

Also, when I use a "here-string"

$test = @'
green

yellow
blue

white
'@
$x= $test -split "`r`n", 5, "multiline"

Everything excepts $x[0] is empty (i.e $x[2])

I was already looking here: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_split?view=powershell-7.2

And here: powershell -split('') specify a new line

And here: WT: Paste multiple lines to Windows Terminal without executing

So far I have not found a solution to my problem.

Any help is appreciated.

EDIT: I found a hint about that problem, but don't understand the implications of it yet: https://n-v-o.github.io/2021-06-10-String-Method-in-Powershell-7/

EDIT 2: Thanks everyone for participating in answering my question. First I thought I'm going to write a long explanation why my question is different then the duplicated answer from @SantiagoSquarzon. But while reading the answers to my question and the other question I noticed I was doing something differently..

Apparently there is something differnt when I use

$splits = $test -split '`r?`n' # doesn't work in 5.1 and 7.2.1
$splits = $test -split '\r?\n' # works in 5.1 and 7.2.1 as suggested from Santiago and others

BUT

$splits = $test.Split("\r?\n") # doesn't work in 5.1 and 7.2.1
$splits = $test.Split("`r?`n") # doesn't work in 5.1 and 7.2.1
$splits = $test.Split([char[]]"\r\n") # doesnt' work in 7.2.1
$splits = $test.Split([char[]]"`r`n") # works in 7.2.1
Dan Stef
  • 753
  • 1
  • 10
  • 25
  • 1
    Your first example works fine for me on PS Core (Linux). And on your second example try with `-split "\r?\n"` – Santiago Squarzon Dec 16 '21 at 13:39
  • Thanks for your answer. I'm on Windows 10 (20H2). Same problem with "\`r\`n" – Dan Stef Dec 16 '21 at 13:57
  • Your first example doesnt work for me on PS7 but it does work if i use `$test.Split("\`n")`. i think some of this will behave differently depending on your host environment (mine is windows) – Otter Dec 16 '21 at 13:58
  • 1
    `-split "\r?\n"` should work for all cases, be it Windows / Linux / mac / PS Core / Win PS. – Santiago Squarzon Dec 16 '21 at 14:00
  • Yes. I agree.. It SHOULD. But it doesn't.. At least not for me.. But I found a solution.. Will post answer in a minute – Dan Stef Dec 16 '21 at 14:05
  • Note that `.Split([Environment]::NewLine)` does work as intended, but only in PowerShell _Core_ (v6+), and only with platform-native newlines, i.e. CRLF on Windows. All your symptoms point to your input string using CR-only newlines, which is highly unusual (What email client are you copying from? That said, most editors convert newlines to match the format used in the target file on pasting). – mklement0 Dec 16 '21 at 19:16
  • @mklement0: I copied the text from an outlook mail. But I also copied the text into notepad++ and made sure that there where all the "CRLF" there.. – Dan Stef Dec 17 '21 at 10:58
  • 1
    I see. If you truly have CRLF newlines, note that the solutions in your own answer as well as `$test.Split([char[]]"\`r\`n")` in your latest edit won't work as intended, because you'll get extra, empty elements representing the empty strings between each CR and LF sequence, due to matching these chars. _individually_. Your latest edit also shows a misconception with respect to which escape sequences you can use where, as well as use of escape sequences in verbatim vs. expandable strings - please see the update to my answer. /cc @SantiagoSquarzon – mklement0 Dec 17 '21 at 16:41

3 Answers3

2

tl;dr:

  • Use -split '\r?\n to split multiline text into lines irrespective of whether Windows-format CRLF or Unix-format LF newlines are used (it even handles a mix of these formats in a single string).

  • If you additionally want to handle CR-only newlines (which would be unusual, but appears to be the case for you), use -split '\r?\n|\r'

  • On Windows, with CRLF newlines only, .Split([Environment]::NewLine) only works as intended in PowerShell (Core) 7+, not in Windows PowerShell (and, accidentally, in Windows PowerShell only with CR-only newlines, as in your case.) To explicitly split by CR only, .Split("`r") would happen to work as intended in both editions, due to splitting by a single character only.

# Works on both Unix and Windows, in both PowerShell editions.
# Input string contains a mix of CRLF and LF and CR newlines.
"one`r`ntwo`nthree`rfour" -split '\r?\n|\r' | ForEach-Object { "[$_]" }

Output:

[one]
[two]
[three]
[four]

This is the most robust approach, as you generally can not rely on input text to use the platform-native newline format, [Environment]::NewLine; see the bottom section for details.

Note:

  • The above uses PowerShell's -split operator, which operates on regexes (regular expressions), which enables the flexible matching logic shown above.

  • By contrast, the System.String.Split() .NET method only splits by literal strings, which, while faster, limits you to finding verbatim separators.

  • The syntax implications are:

    • Regex constructs such as escape sequences \r (CR) and \n (LF) are only supported by the .NET regex engine and therefore only by -split (and other PowerShell contexts where regexes are being used); ditto for regex metacharacters ? (match the preceding subexpression zero or one time) and | (alternation; match the subexpression on either side).
      Inside strings (which is how regexes must be represented in PowerShell, preferably inside '...'), these sequences and characters have no special meaning, neither to PowerShell itself nor to the .Split() method, which treats them all verbatim.

    • By contrast, the analogous escape sequences "`r" (CR) and "`n" (LF) are PowerShell features, available in expandable strings, i.e. they work only inside "..." - not also inside verbatim strings, '...' - and are expanded to the characters they represent before the target operator, method, or command sees the resulting string.

  • This answer discusses -split vs. .Split() in more depth and recommends routine use of -split.


As for what you tried:

  • Use [Environment]::NewLine only if you are certain that the input string uses the platform-native newline format. Notably, multiline string literals entered interactively at the PowerShell prompt use Unix-format LF newlines even on Windows (the only exception is the obsolescent Windows-only ISE, which uses CRLF).

  • String literals in script files (*.ps1) use the same newline format that the script is saved in - which may or may not be the platform's format.

  • Additionally, as you allude to in your own answer, the addition of a string parameter overload in the System.String.Split() method in .NET Core / .NET 5+ - and therefore PowerShell (Core) v6+ - implicitly caused a breaking change relative to Windows PowerShell: specifically, .Split('ab') splits by 'a' or 'b' - i.e. by any of the individual characters that make up the string - in Windows PowerShell, whereas it splits by the whole string, 'ab', in PowerShell (Core) v6+.

    • Such implicit breaking changes are rare, but they do happen, and they're outside PowerShell's control.

    • For that reason, you should always prefer PowerShell-native features for long-term stability, which in this case means preferring the -split operator to the .Split() .NET method.

      • That said, sometimes .NET methods are preferable for performance reasons; you can make them work robustly, but only if carefully match the exact data types of the method overloads of interest, which may require cast; see below.
    • See this answer for more information, including a more detailed explanation of the implicit breaking change.

Your feedback on -split '\r?\n' not working for you and the solutions in your own answer suggest that your input string - unusually - uses CR-only newlines.

Your answer's solutions would not work as expected with Windows-format CRLF-format text, because splitting would happen for each CR and LF in isolation, which would result in extra, empty elements in the output array (each representing the empty string "between" a CRLF sequence).

If you did want to split by [Environment]::NewLine on Windows - i.e. by CRLF - and you wanted to stick with the .Split() method, in order to make it work in Windows PowerShell too, you'd need to call the overload that expects a [string[]] argument, indicating that each string (even if only one) is to be used as a whole as the separator - as opposed to splitting by any of its individual characters:

# On Windows, split by CRLF only.
# (Would also work on Unix with LF-only text.)
# In PowerShell (Core) 7+ only, .Split([Environment]::NewLine) would be enough.
"one`r`ntwo`r`nthree".Split([string[]] [Environment]::NewLine, [StringSplitOptions]::None) |
  ForEach-Object { "[$_]" }

Output:

[one]
[two]
[three]

While this is obviously more ceremony than using -split '\r?\n', it does have the advantage of performing better - although that will rarely matter. See the next section for a generalization of this approach.


Using an unambiguous .Split() call for improved performance:

Note:

  • This is only necessary if -split '\r?\n' or -split '\r?\n|\r' turns out to be too slow in practice, which won't happen often.

  • To make this work robustly, in both PowerShell editions as well as long-term, you must carefully match the exact data types of the .Split() overload of interest.

  • The command below is the equivalent of -split '\r?\n|\r', i.e. it matches CRLF, LF, and CR newlines. Adapt the array of strings for more restrictive matching.

# Works on both Unix and Windows, in both PowerShell editions
"one`r`ntwo`nthree`rfour".Split(
  [string[]] ("`r`n", "`n", "`r"),
  [StringSplitOptions]::None
) | ForEach-Object { "[$_]" }
mklement0
  • 382,024
  • 64
  • 607
  • 775
1

The reason: When pasting text into the terminal, it matters which terminal you are using. The default powershell 5.1, ISE terminals, and most other Windows software separates new lines with both carriage return \r and newline \n characters. We can check by converting to bytes:

# 5.1 Desktop
$test = "a
b
c"
[byte[]][char[]]$test -join ','

97,13,10,98,13,10,99
#a,\r,\n, b,\r,\n, c

Powershell Core separates new lines with only a newline \n character

# 7.2 Core
$test = "a
b
c"
[byte[]][char[]]$test -join ','

97,10,98,10,99

On Windows OS, [Environment]::NewLine is \r\n no matter which console. On Linux, it is \n.


The solution: split multiline strings on either \r\n or \n (but not on only \r). The easy way here is with regex like @Santiago-squarzon suggests:

$splits = $test -split '\r?\n'
$splits[0]
a
$splits[1]
b
Cpt.Whale
  • 4,784
  • 1
  • 10
  • 16
  • 2
    Will up-vote because I agree with your answer but still feel the question should be closed for the duplicate I linked. It's just that OP refused to try out the options given to him. – Santiago Squarzon Dec 16 '21 at 15:24
  • 1
    @Santiago, it is _mostly_ a duplicate, yes, but I think two aspects are worth addressing separately: the use of `[Environment]::NewLine` and the fact that - as far as I can tell - the OP has CR-only newlines (which is unusual), so showing how to deal with those by extending the `\r?\n` regex is worthwhile. – mklement0 Dec 16 '21 at 16:20
  • @mklement0 Honest question, how do you create a here-string with carriage return only and no newline on Windows unless specifically using \`r ? – Santiago Squarzon Dec 16 '21 at 16:27
  • @Cpt.Whale: Note that it is only the (obsolescent) ISE that uses CRLF at the interactive prompt; _terminals_ (regular console windows, Windows Terminal, the integrated terminal in VSCode) use LF-only - _both_ in Windows PowerShell and PowerShell (Core). Quibble: it's better to cast `[char]` instances to `[int]` than to `[byte]` - the latter will break with Unicode characters with code points above `0xff` (e.g. `[byte[]] [char[]] '⚠️'`) – mklement0 Dec 16 '21 at 16:33
  • 1
    @SantiagoSquarzon, the only way to make that happen would be to save the enclosing script file with CR-only newlines (if your editor lets you). Definitely unusual (CR-only files may be produced by legacy applications, including pre-OSX/macOS Mac applications), but based on the OP's feedback it strikes me as the only plausible explanation. – mklement0 Dec 16 '21 at 16:37
  • @mklement0 exactly, which explains my point that, we either don't have complete information on _"the issue"_ and based on the code provided as reproducible example by the OP we're unable to reproduce it unless the here-string has been manually constructed using \`r for the newlines. Which is why I felt that this question should be closed for duplicate unless we have the complete information on the issue and how we can reproduce it. – Santiago Squarzon Dec 16 '21 at 16:41
  • @Santiago, I think that the _conceptually_ problematic use of `[Environment]::NewLine` alone warrants addressing separately. And I do think that calling out the _possibility_ of CR-only newlines and how to handle them may at least be useful to _future readers_, whose use cases may not involve string _literals_ - whether or not the OP's specific problem was ultimately a different one. (As for reproducing, which I think is moot for the reasons given: It seems unlikely and requires opt-in, but you _can_ produce CR-only `.ps1` files wit Notepad++, for instance.) – mklement0 Dec 16 '21 at 16:52
0

Thanks to this site I found a solution: https://n-v-o.github.io/2021-06-10-String-Method-in-Powershell-7/

In .NET 4, the string class only had methods that took characters as parameter types. PowerShell sees this and automagically converts it, to make life a little easier on you. Note there’s an implied ‘OR’ (|) here as it’s an array of characters.

Why is PowerShell 7 behaving differently? In .NET 5, the string class has some additional parameters that accept strings. PowerShell 7 does not take any automatic action.

In order to fix my problem, I had to use this:

$test.Split("`r").Split("`n") #or
$test.Split([char[]]"`r`n")
Dan Stef
  • 753
  • 1
  • 10
  • 25
  • 1
    You apparently didn't try @SantiagoSquarzon comment `-split '\r?\n'`.. that uses regex where the question mark means to split on CRLF or if no CR then on LF only.. – Theo Dec 16 '21 at 14:38
  • You are right, I didn't try his answer. But I tried it just now and it's not a solution.. It does not work.. It's still that `x[1]` is empty... With my posted solution, it works both in Powershell 5 and Powershell 7... – Dan Stef Dec 16 '21 at 14:44
  • 1
    @DanStef _"not work"_ in this case applies only for you, that regex will work for anyone else no matter their OS or PS Version. Hence, your issue is not reproducible. – Santiago Squarzon Dec 16 '21 at 14:48
  • 3
    @Dan Your answer will split twice on strings with windows line endings: `\r\n`, leading to different results between line ending formats. Depending on your use case, that may not matter. Santiago's answer is better, or you could use `-split '\r\n|\n'` (same, but longer). – Cpt.Whale Dec 16 '21 at 14:49