4

I am trying to match some strings using a regular expression in PowerShell but due to the differing format of the original string that I'm extracting from, encountering difficulty. I admittedly am not very strong with creating regular expressions.

I need to extract the numbers from each of these strings. These can vary in length but in both cases will be preceded by Foo

PC1-FOO1234567
PC2-FOO1234567/FOO98765

This works for the second example:

'PC2-FOO1234567/FOO98765' -match 'FOO(.*?)\/FOO(.*?)\z'

It lets me access the matched strings using $matches[1] and $matches[2] which is great.

It obviously doesn't work for the first example. I suspect I need some way to match on either / or the end of the string but I'm not sure how to do this and end up with my desired match.

Suggestions?

mklement0
  • 382,024
  • 64
  • 607
  • 775
YEMyslf
  • 407
  • 2
  • 12
  • 27

3 Answers3

4

You may use

'FOO(.*?)(?:/FOO(.*))?$'

It will match FOO, then capture any 0 or more chars as few as possible into Group 1 and then will attempt to optionally match a sequence of patterns: /FOO, any 0 or more chars as many as possible captured into Group 2 and then the end of string position should follow.

See the regex demo

Details

  • FOO - literal substring
  • (.*?) - Group 1: any zero or more chars other than newline, as few as possible
  • (?:/FOO(.*))? - an optional non-capturing group matching 1 or 0 repetitions of:
    • /FOO - a literal substring
    • (.*) - Group 2: any 0+ chars other than newline as many as possible (* is greedy)
  • $ - end of string.

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

[edit - removed the unneeded pipe to Where-Object. thanks to mklement0 for that! [*grin*]]

this is a somewhat different approach. it splits on the foo, then replaces the unwanted / with nothing, and finally filters out any string that contains letters.

the pure regex solutions others offered will likely be faster, but this may be slightly easier to understand - and therefore to maintain. [grin]

# fake reading in a text file
#    in real life, use Get-Content
$InStuff = @'
PC1-FOO1234567
PC2-FOO1234567/FOO98765
'@ -split [environment]::NewLine

$InStuff -split 'foo' -replace '/' -notmatch '[a-z]'

output ...

1234567
1234567
98765
Lee_Dailey
  • 7,292
  • 2
  • 22
  • 26
2

To offer a more concise alternative with the -split operator, which obviates the need to access $Matches afterwards to extract the numbers:

PS> 'PC1-FOO1234568', 'PC2-FOO1234567/FOO98765' -split '(?:^PC\d+-|/)FOO' -ne ''
1234568  # single match from 1st input string
1234567  # first of 2 matches from 2nd input string
98765

Note: -split always returns a [string[]] array, even if only 1 string is returned; result strings from multiple input strings are combined into a single, flat array.

  • ^PC\d+-|/ matches PC followed by 1 or more (+) digits (\d) at the start of the string (^) or (|) a / char., which matches both PC2-FOO at the beginning and /FOO.

    • (?:...), a non-capturing subexpression, must be used to prevent -split from including what the subexpression matched in the results array.
  • -ne '' filters out the empty elements that result from the input strings starting with a separator.


To learn more about the regex-based -split operator and in what ways it is more powerful than the string literal-based .NET String.Split() method, see this answer.

mklement0
  • 382,024
  • 64
  • 607
  • 775