4

Trying to extract a 6 digit numeric string from a paragraph of text using PowerShell but it works only under one scenario. The 6 digit string sits inside a paragraph of text in the Windows Clipboard. In my code I'm expecting the variable $Matches[0] to be the 6 digits I'm looking for but the result is always blank. If I uncomment line 2 then $Matches[0] will always be whatever the 6 digit code in line 2 is, meaning 123456 as shown below. But if I comment out line 2, then copy a paragraph of text from my real world example, and re-run the code, instead of $Matches[0] being the expected 6 digit string, it is always blank. I will walk through both examples and their outputs below. Not sure what I am doing wrong.

Working Example:

$Matches[0] = $null
Set-Clipboard -value "Your PIN is 123456."
$PIN = (Get-Clipboard) -match '\d{6}'
# Get-Clipboard
Write-Output $Matches[0]

The above code will output the below, as expected:

Working Example

Non-working Example: If I comment out line 2:

$Matches[0] = $null
# Set-Clipboard -value "Your PIN is 123456."
$PIN = (Get-Clipboard) -match '\d{6}'
# Get-Clipboard
Write-Output $Matches[0]

and given this paragraph of text, copied into the Windows Clipboard:

Hello,

Your authentication code is 351370

This code will expire in 20 minutes to keep your account secure.

The output shows blank, instead of the expected 351370:

Non-working Example

Thoughts?

T-Heron
  • 5,385
  • 7
  • 26
  • 52

2 Answers2

3
PS C:\> Get-Clipboard
Hello,

Your authentication code is 351370

This code will expire in 20 minutes to keep your account secure.

PS C:\> ([regex]'\d{6}').Match((Get-Clipboard)).Value
351370

Edit: Sorry, I should explain at least a bit more. When using regex to search and match specific pattern within a string you can use .Match if you want the first appearance of your pattern or .Matches to find all appearances. Example:

PS C:\> $re=[regex]"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b"

PS C:\> $re.match('string string string string string 
string string string 127.0.0.1 string string 
string 192.168.1.1 string string string string 
string string string 255.255.255.0 string string 
string string string string string string ').value

127.0.0.1

PS C:\> $re.matches('string string string string string 
string string string 127.0.0.1 string string 
string 192.168.1.1 string string string string 
string string string 255.255.255.0 string string 
string string string string string string ').value

127.0.0.1
192.168.1.1
255.255.255.0
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • 1
    That works, and showing the `.Matches` variant is a nice addition, given that `-match` only ever finds _one_ match, but there's a caveat: Because `Get-Clipboard` returns a multiline string as an _array_ of lines, passing an array to the `.Match()` method's `string` parameter causes PowerShell to _stringify_ the array, which means _joining the elements with spaces_. For instance, if `"line 1\`nline 2"` is on the clipboard, `Get-Clipboard` returns array `'line 1', 'line 2'` and stringifies that to space-separated single-line string `'line 1 line 2'`, which can lead to false positives. – mklement0 Mar 13 '21 at 14:45
  • 1
    To illustrate the problem: `([regex] '1 l').Match(('line 1', 'line 2')).Success` yields `$true`. You can make `Get-Clipboard` return a multiline string as-is with the `-Raw` switch, which is the only thing missing from T-Heron's own solution attempt with `-match`. – mklement0 Mar 13 '21 at 14:46
3

The problem:

The solution is to request the text on the clipboard as a single, multiline string, using the
-Raw switch:

if ((Get-Clipboard -Raw) -match '\d{6}') {
  $Matches[0] # -> '351370'
}

An alternative is to use -replace, the regex-based string-replacement operator, which requires matching the entire string and replacing it with what a capture group matched:

@'
Hello,

Your authentication code is 351370

This code will expire in 20 minutes to keep your account secure
'@ -replace '(?s).*(\d{6}).*', '$1' # -> 351370

Note:

  • Inline option s (SingleLine; ((?s)) ensures that . also matches newline (\n) characters, to enable matching across all lines of a multiline string.

  • In the replacement operand, $1 refers to what the first (and only) capture group ((...)) captured.

  • Caveat: if the regex does not match the input, the input string is returned as-is.


Finally, direct use of .NET APIs via the [regex] class (System.Text.RegularExpressions.Regex) is another alternative, as shown in santisq's answer, but it requires advanced knowledge outside the realm of PowerShell's own commands and operators.

mklement0
  • 382,024
  • 64
  • 607
  • 775