3

I have a | delimited file and I have some data where for null values it has a space. So, in my data file I'll have something like this:

2080| | | | | | | | | | | | | |2000225

I tried this:

-replace '\| \|', '||'

but it matches pairs of | and still leaves the space when it's done between |. I'm just not really good with regex and totally new to Powershell.

2080|| || || ....|2000225

I'm not sure if recursion would solve this or if I'm going to need to write a short Java program to do it.

mklement0
  • 382,024
  • 64
  • 607
  • 775
jmorga11
  • 33
  • 3

3 Answers3

4

You can use the regex-based -replace operator as follows:

PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace ' (\||$)', '$1'
|2080||||||||||||||2000225|

This assumes that no non-empty fields have trailing spaces - if they do, their (last) trailing space will be removed; to avoid this, use the appropriate solution from Wiktor Stribiżew's helpful answer.

Regex (\||$) matches a single space char. followed by either a literal | (escaped as \|) or (|) the end of the string ($); $1 in the replacement string then replaces whatever the 1st capture group ((...)) matched; that is, if the space char. was followed by literal |, it is effectively replaced with just |; if it was followed by the end of the string, it is effectively removed.


A slight simplification is to use a positive lookahead assertion ((?=...)), as also used in Wiktor's answer, which captures the space character only, and therefore allows omission of the substitution-text -replace operand, which defaults to the empty string and therefore effectively removes the spaces:

PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace ' (?=\||$)'
|2080||||||||||||||2000225|
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Thank you for the answers and explanation. Trailing spaces aren't significant in our data and if there are any they can be trimmed as well. – jmorga11 Jul 15 '20 at 18:31
1

Using -replace with a regex based search, you may....

Remove all whitespace between two | chars:

$text -replace '(?<=\|)\s+(?=\|)'

To only remove spaces in between | and start/end of string

$text -replace '(?<=\||^)\s+(?=\||$)'
$text -replace '(?<![^|])\s+(?![^|])'

Remove all whitespace characters that are either followed with | or end of string

$text -replace '\s+(?=\||$)'
$text -replace '\s+(?![^|])'

Output: 2080||||||||||||||2000225. See the regex demo.

Details

  • \s+ - 1 or more whitespace characters
  • (?=\||$) - a positive lookahead that requires a | char (\|) or (|) end of string ($) immediately to the right of the current location.
  • (?![^|]) - a negative lookahead that fails the match if there is a char other than | immediately to the right of the current location.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • That will do extra changes that probably are not desired, as matching some field ended in spaces, like `|foo | bar|`. – Luis Colorado Jul 15 '20 at 08:39
  • @LuisColorado Probably you mean I should have used `$text -replace '(?<=\||^)\s+(?=\||$)'` in the answer? To only remove spaces in between `|` and start/end of string? Or `$text -replace '(?<=\|)\s+(?=\|)'` to only remove spaces between `|` chars? Added all the variations. – Wiktor Stribiżew Jul 15 '20 at 08:54
1

You don't need to run a recursive function to do that. Just run it twice. The problem is that once you match | |, you are past the start of the next occurence. In the first pass, you leave all the ocurrences of | | | (so after the first match <| |> |, you will have | as starting point for new matches, which doesn't match) for the second one... of if you have more, you left without matching all the even occurences that are stuck together. If you run it only a second time, you'll match and change all those matches you left the first time. Run it a second time and you'll see that it works.

Just do:

PS> ' |2080| | | | | | | | | | | | | |2000225| ' -replace '| |', '||' -replace '| |', '||'
|2080||||||||||||||2000225|

You won't need more.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
  • There are about 150 fields and in test we had to run it 5 times given the test data and the number of empty fields but that could change so running it multiple times and hoping it catches all of them is not really practical. – jmorga11 Jul 15 '20 at 18:26
  • normally if you have all of them completely stuck to each other, the first run will take all the odd numbered, the next will catch all the even ones.... I don't understand you assertion of running them 5 times... if they are scattered around the line, you will end it in just one pass. You need to pass twice to get the even ones... just once. – Luis Colorado Jul 21 '20 at 09:37
  • I've just tried it with `vi(1)` and a very long string full of different ways to spread the `| |` patterns, and more than 1000 fields in total, and *it took only two executions* of the pattern substitution command 's/| |/||/g' to make all to contract into `||`. Have you tried my solution, or you just don't believe that executing the pattern substitution only one more time you'll achieve the desired results? – Luis Colorado Jul 21 '20 at 09:44