68

I want to split each line of a pipe on spaces, and then print each token on its own line.

I realise that I can get this result using:

(cat someFileInsteadOfAPipe).split(" ")

But I want more flexibility. I want to be able to do just about anything with each token. (I used to use AWK on Unix, and I'm trying to get the same functionality.)

I currently have:

echo "Once upon a time there were three little pigs" | %{$data = $_.split(" "); Write-Output "$($data[0]) and whatever I want to output with it"}

Which, obviously, only prints the first token. Is there a way for me to for-each over the tokens, printing each in turn?

Also, the %{$data = $_.split(" "); Write-Output "$($data[0])"} part I got from a blog, and I really don't understand what I'm doing or how the syntax works.

I want to google for it, but I don't know what to call it. Please help me out with a word or two to Google, or a link explaining to me what the % and all the $ symbols do, as well as the significance of the opening and closing brackets.

I realise I can't actually use (cat someFileInsteadOfAPipe).split(" "), since the file (or preferable incoming pipe) contains more than one line.

Regarding some of the answers:

If you are using Select-String to filter the output before tokenizing, you need to keep in mind that the output of the Select-String command is not a collection of strings, but a collection of MatchInfo objects. To get to the string you want to split, you need to access the Line property of the MatchInfo object, like so:

cat someFile | Select-String "keywordFoo" | %{$_.Line.Split(" ")}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Pieter Müller
  • 4,573
  • 6
  • 38
  • 54

4 Answers4

147
"Once upon a time there were three little pigs".Split(" ") | ForEach {
    "$_ is a token"
 }

The key is $_, which stands for the current variable in the pipeline.

About the code you found online:

% is an alias for ForEach-Object. Anything enclosed inside the brackets is run once for each object it receives. In this case, it's only running once, because you're sending it a single string.

$_.Split(" ") is taking the current variable and splitting it on spaces. The current variable will be whatever is currently being looped over by ForEach.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Justus Grunow
  • 1,796
  • 1
  • 16
  • 23
  • 3
    aaaah, thanks for the edit. Knowing that `%` is short for `foreach-object` means I can do this for multiple lines: `cat .\tmp.txt | %{$_.Split(" ")} | %{Write-Output "$($_) hello"}` Problem solved. – Pieter Müller Jul 05 '12 at 16:47
  • 2
    Perfect! Glad I could help. The last part of your command could actually just be `"$_ hello"`. You only need to use the $($variable) notation if you're trying to expand the value of an object's property inside a string. For example `"My last name is $($person.surname)."` Or the output of a cmdlet's method: `"Tomorrow's date is $((Get-Date).AddDays(1))"`. – Justus Grunow Jul 05 '12 at 17:03
  • 13
    Just a note: As of PowerShell v2 there is a `-split` operator which can be used to split on whitespace in general (`-split $foo`) or analogous to `.Split(' ')`: `$foo -split ' '`. – Joey Jul 06 '12 at 07:59
4

To complement Justus Grunow's helpful answer:

  • As Joey notes in a comment, PowerShell has a powerful, regex-based -split operator.

    • In its unary form (-split '...'), -split behaves like awk's default field splitting, which means that:
      • Leading and trailing whitespace is ignored.
      • Any run of whitespace (e.g., multiple adjacent spaces) is treated as a single separator.
  • In PowerShell v4+ an expression-based - and therefore faster - alternative to the ForEach-Object cmdlet became available: the intrinsic .ForEach() method, (alongside the .Where() method, a more powerful, expression-based alternative to Where-Object).

Here's a solution based on these features:

PS> (-split '   One      for the money   ').ForEach({ "token: [$_]" })
token: [One]
token: [for]
token: [the]
token: [money]

Note that the leading and trailing whitespace was ignored, and that the multiple spaces between One and for were treated as a single separator.

mklement0
  • 382,024
  • 64
  • 607
  • 775
3

-split outputs an array, and you can save it to a variable like this:

$a = -split 'Once  upon    a     time'
$a[0]

Once

Another cute thing, you can have arrays on both sides of an assignment statement:

$a,$b,$c = -split 'Once  upon    a'
$c

a
js2010
  • 23,033
  • 6
  • 64
  • 66
1

Another way to accomplish this is a combination of Justus Thane's and mklement0's answers. It doesn't make sense to do it this way when you look at a one liner example, but when you're trying to mass-edit a file or a bunch of filenames it comes in pretty handy:

$test = '   One      for the money   '
$option = [System.StringSplitOptions]::RemoveEmptyEntries
$($test.split(' ',$option)).foreach{$_}

This will come out as:

One
for
the
money
s31064
  • 147
  • 2
  • 4
  • I keep finding that I get the wrong number when using a plain text file with one line containing one computer name (hostname) and one blank line. `$counterTotal = $($computers.Split(" ").count)` gives me exactly what I want. Thanks for the inspiration @s31064 – Aubs Aug 23 '19 at 14:43