5

I'm trying to extract data using Regex positive lookbehind. I have created a .ps1 file with the following content:

$input_path = ‘input.log’

$output_file = ‘Output.txt’

$regex = ‘(?<=    "name": ")(.*)(?=",)|(?<=    "fullname": ")(.*)(?=",)|(?<=Start identity token validation\r\n)(.*)(?=ids: Token validation success)|(?<=  "ClientName": ")(.*)(?=",\r\n  "ValidateLifetime": false,)’

select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } >$output_file

The input file looks like this:

08:15.27.47-922: T= 11 ids: Start end session request
08:15.27.47-922: T= 11 ids: Start end session request validation
08:15.27.47-922: T= 11 ids: Start identity token validation
08:15.27.47-922: T= 11 ids: Token validation success
{
  "ClientId": "te_triouser",
  "ClientName": "TE Trio User",
  "ValidateLifetime": false,
  "Claims": {
    "iss": "http://sv-trio17.adm.linkoping.se:34000/core/",
    "aud": "te_triouser",
    "exp": "1552054900",
    "nbf": "1552054600",
    "nonce": "f1ae9044-25f9-4e7f-b39f-bd7bdcb9dc8d",
    "iat": "1552054600",
    "at_hash": "Wv_7nNe42gUP945FO4p0Wg",
    "sid": "9870230d92cb741a8674313dd11ae325",
    "sub": "23223",
    "auth_time": "1551960154",
    "idp": "tecs",
    "name": "tele2",
    "canLaunchAdmin": "1",
    "isLockedToCustomerGroup": "0",
    "customerGroupId": "1",
    "fullname": "Tele2 Servicekonto Test",
    "tokenIdentifier": "2Ljta5ZEovccNlab9QXb8MPXOqaBfR6eyKst/Dc4bF4=",
    "tokenSequence": "bMKEXP9urPigRDUguJjvug==",
    "tokenChecksum": "NINN0DDZpx7zTlxHqCb/8fLTrsyB131mWoA+7IFjGhAV303///kKRGQDuAE6irEYiCCesje2a4z47qvhEX22og==",
    "idpsrv_lang": "sv-SE",
    "CD_UserInfo": "23223 U2 C1",
    "amr": "optional"
  }
}

If i run the regex through http://regexstorm.net/tester i get the right matches. But when i run my script with powershell on my computer I dont get the matches where I have \r\n in the regex question. I only get the matches from the first two regex questions.

EGAD
  • 51
  • 1
  • 1
    Are you sure that your file has Windows-style line endings? – Piotr Stapp Mar 22 '19 at 11:43
  • 3
    Part of the problem here may be using the`-path` parameter of the `Select-String` cmdlet. That will read each line into a string array. You may have better success if you read the entire file in as a single string. You can test this with `Get-Content input.log -Raw | Select-String -pattern $regex`. – AdminOfThings Mar 22 '19 at 11:58

1 Answers1

2
  • I agree with @AdminOfThings to use Get-Content with the -raw parameter.
  • also don't use typographic quotes in scripts.
  • If the number of leading spaces aren't really fixed replace with one space and + or * quantifier.
  • make the \r optional => \r?.

A minimal complete verifiable example should also include your expected output.

EDIT changed Regex to be better readable

The following script

## Q:\Test\2019\03\22\SO_55298614.ps1

$input_path = 'input.log'
$output_file = 'Output.txt'

$regexes = ('(?<= *"(full)?name": ")(.*)(?=",)',
            '(?<=Start identity token validation\r?\n)(.*)(?=ids: Token validation success)',
            '(?<= *"ClientName": ")(.*)(?=",\r?\n *"ValidateLifetime": false,)')

$regex = [RegEx]($regexes -join'|')


Get-Content $input_path -Raw | Select-String -pattern $regex -AllMatches | 
   ForEach-Object { $_.Matches.Value }

yields this sample output:

> Q:\Test\2019\03\22\SO_55298614.ps1
08:15.27.47-922: T= 11
TE Trio User
tele2
Tele2 Servicekonto Test
  • 1
    Nicely done. As an aside: While it makes sense to avoid typographic quotes, it's important to understand that they _are_ supported in PowerShell, and that problems relating to them usually stem from _encoding_ problems - see https://stackoverflow.com/a/55053609/45375 – mklement0 Mar 22 '19 at 13:19
  • Oh, I forgot to include my expected output. Sorry about that, but it seems you got it anyways. Thank you for this, this was exactly the output i needed. I understand the '(?<= *"(full)?name": ")(.*)(?=",)' – EGAD Mar 25 '19 at 09:39