1

A string (extracted from an Outlook email message body.innerText) contains embedded newlines. How can I split this into an array of strings?

I would expect this example string to be split into an array of two (2) items. Instead, it becomes an array of three (3) items with a blank line in the middle.

PS C:\src\t> ("This is`r`na string.".Split([Environment]::NewLine)) | % { $_ }
This is

a string.
PS C:\src\t> "This is `r`na string.".Split([Environment]::NewLine) | Out-String | Format-Hex

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   54 68 69 73 20 69 73 20 0D 0A 0D 0A 61 20 73 74  This is ....a st
00000010   72 69 6E 67 2E 0D 0A                             ring...
mklement0
  • 382,024
  • 64
  • 607
  • 775
lit
  • 14,456
  • 10
  • 65
  • 119
  • 1
    Using your code as-is, I get an an array with two items. Do you have multiple consecutive newlines in the actual email content? – OwlsSleeping Jul 11 '20 at 17:08
  • the `Out-String` cmdlet adds "stuff" to the output. it's usually things like `cr/lf` at the end of each line. DO NOT use the cmdlet when you want to see what the output of something really is. [*grin*] – Lee_Dailey Jul 11 '20 at 17:19
  • the `.Split()` method splits on EVERY CHAR in the split-on-this string, not on the whole string. so you got a split on the `lf` and on the `cr`. that is where your extra line is coming from. [*grin*] the Answer by mklement0 is one way to avoid that glitch. – Lee_Dailey Jul 11 '20 at 17:22
  • @Lee_Dailey Note that this no longer true in PowerShell v6+ (a new .NET Core method overload that takes a `[string]` separator now takes precedence) - that's one good reason to always prefer `-split` over `.Split()`. – mklement0 Jul 11 '20 at 17:32
  • 1
    @mklement0 - thank you for the reminder [*grin*] ... i had seen that and forgotten it. [*blush*] – Lee_Dailey Jul 11 '20 at 17:34

3 Answers3

3

To treat a CRLF sequence as a whole as the separator, it's simpler to use the -split operator, which is regex-based:

PS> "This is `r`n`r`n a string." -split '\r?\n'
This is 
 a string.

Note:

  • \r?\n matches both CRLF (Windows-style) and LF-only (Unix-style) newlines; use \r\n if you really only want to match CRLF sequences.

    • Note the use of a single-quoted string ('...'), so as to pass the string containing the regex as-is through to the .NET regex engine; the regex engine uses \ as the escape character; hence the use of \r and \n.
  • PowerShell's -split operator is a generally superior alternative to the [string] .NET type's .Split() method - see this answer.


As for what you tried:

The separator argument, [Environment]::NewLine, on Windows is the string "`r`n", i.e. a CRLF sequence.

  • In PowerShell [Core] v6+, your command does work, because this string as a whole is considered the separator.

  • In Windows PowerShell, as Steven points out in his helpful answer, the individual characters - CR and LF separately are considered separators, resulting in an extra, empty element - the empty string between the CR and the LF - in the result array.

This change in behavior happened outside of PowerShell's control: .NET Core introduced a new .Split() method overload with a [string]-typed separator parameter, which PowerShell's overload-resolution algorithm now selects over the older overload with the [char[]]-typed parameter.
Avoiding such unavoidable (albeit rare) inadvertent behavioral changes is another good reason to prefer the PowerShell-native -split operator over the .NET [string] type's .Split() method.

mklement0
  • 382,024
  • 64
  • 607
  • 775
3

This is because .Split() method will split on any of the characters it finds for example:

"first part of a string-*second part of a string".Split("-*")

Output:

first part of a string

second part of a string

The extra element is an empty string inserted between the 2 split characters.

(credit to @mklement0, for correcting that)

So I can only assume this is a result of a couple of factors. First [Environment]::NewLine is both characters CarrigeReturn & LineFeed and the line coming from outlook is indeed using that line ending sequence. All to be expected in Windows.

There are 2 solutions I can think of:

Option 1:

.Split([Environment]::NewLine), [Stringsplitoptions]::RemoveEmptyEntries)

This obviously sticks with the same .Split() method, but the added parameter will kill the extra element.

Option 2:

Use the PowerShell -split operator which matches the split delimiter using a RegEx:

"This is`r`na string." -split "`r`n"
Steven
  • 6,817
  • 1
  • 14
  • 14
  • As for `.Split()` splitting on any character: Note that this no longer true in PowerShell v6+ (a new .NET Core method overload that takes a `[string]` separator now takes precedence) - that's one good reason to always prefer `-split` over `.Split()`. – mklement0 Jul 11 '20 at 17:35
0

Hello,

I'm a big NooB in PowerShell, but ...
I ave wrote this

$str_1 ="This is

a string."
$splt_1=$str_1.Split()
$cnt_1=1
foreach ($item in $splt_1) {
     $regEx="[a-zA-Z]"
     if ($item -cmatch $regEx){
          $Result_1=$Result_1+"$item "
     } elseif ($cnt_1 -eq 1) {
          $Result_1=$Result_1+"| "
          $cnt_1=$cnt_1+1
     }
}
Write-Host $Result_1
## OUTPUT ##
# This is | a string.

$str_2="
This is

....a st

ring...
"
$splt_2=$str_2.Split()
$cnt_2=1
foreach ($item in $splt_2) {
     $regEx="[a-zA-Z]"
     if ($item -cmatch $regEx){
          $cnt_2=1
          $Result_2=$Result_2+"$item "
     } elseif ($cnt_2 -eq 1) {
          $Result_2=$Result_2+"| "
          $cnt_2=$cnt_2+1
     }
}
Write-Host $Result_2
## OUTPUT ##
# | This is | ....a st | ring... |

I Hope it's Help ...

PS:

I Just realize that i have forget the result .....


$Result_1.Split("|")
## OUTPUT ##
This is
 a string.

 $Result_2.Split("|")
 ## OUTPUT ##
 This is
 ....a st
 ring...

EOF

AxelEric.

  • 1
    After you have seen the other 2 answers that solved the problem with a one-liner and some explanation, what made you think you have to post this answer? Could you add some explanation on how you solve the problem and why so many lines of code are needed to solve this simple task? What are the advantages of your solution? – stackprotector Jul 12 '20 at 10:15