3

I have a file that looks like this:

SPECIMEN: Procedure: xxxx1 A) Location: yyyy2
Major zzz B) Location: something
text here C) more


CLINICAL DIAGNOSIS: xyz

Where the newlines are CR then LF.

I'm trying to make regex that reads from the end of Procedure: until the start of CLINICAL DIAGNOSIS but having issues reading multiple lines.

Here's what I have:

$input_file = 'c:\Path\0240188.txt'
$regex = ‘(?m)^SPECIMEN: Procedure: (.*)CLINICAL DIAGNOSIS:’
select-string -Path $input_file -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value }

Which doesn't return anything.

If I change the line to:

$regex = ‘(?m)^SPECIMEN: Procedure: (.*)’

It grabs the first line, but not the rest. I assumed (?m) was suppose to grab multiple lines for me.

Any tips?

JBurace
  • 5,123
  • 17
  • 51
  • 76

5 Answers5

1

(?m) causes ^ and $ anchors to match the beginning and end of each line when implemented. You want to use the inline (?s) modifier which forces the dot to match all characters, including line breaks.

$regex = ‘(?s)SPECIMEN: Procedure: (.*)CLINICAL DIAGNOSIS:’
hwnd
  • 69,796
  • 4
  • 95
  • 132
  • This didn't give me any results. If I took out the `CLINICAL DIAGNOSIS:` in your line, it only ended up returning `SPECIMEN: Procedure: `; the `?` seems to be part of the issue causing this? – JBurace Sep 11 '14 at 20:01
1

It seems that the $input_file only reads line by line, which doesn't help you here,

Try:

$fileContent = [io.file]::ReadAllText("C:\file.txt")

Or

$fileContent = Get-Content c:\file.txt -Raw

Taken from another post here.

Community
  • 1
  • 1
EurikaIam
  • 136
  • 9
1

Try this:

$regex = '(?ms).*SPECIMEN: Procedure:(.+)CLINICAL DIAGNOSIS: '

Get-Content $input_file -Delimiter 'CLINICAL DIAGNOSIS: '|
 foreach {@($_) -match 'CLINICAL DIAGNOSIS: ' -replace $regex,'$1'}

Using 'Clinical Diagnosis' as a delimiter eliminates the need to read in all the data at once and resolve/capture multiple matches at once.

mjolinor
  • 66,130
  • 7
  • 114
  • 135
0

Try with this:

$input_file = gc 'c:\Path\0240188.txt' | out-string
# or: gc c:\path\xxxxx.txt -raw  #with v3+
$regex = ‘(?s)\bSPECIMEN: Procedure: (.*?)CLINICAL DIAGNOSIS:’
$input_file | select-string -Pattern $regex -AllMatches | % { $_.Matches }
# or: [regex]::matches($input_file, $regex) # much faster
walid toumi
  • 2,172
  • 1
  • 13
  • 10
0

You could use a little regex trick like this:

Procedure:([\S\s]+)CLINICAL DIAGNOSIS

Working demo

enter image description here

Since the . matches everything except new lines you could use [\S\s]+ to match everything as the image shows in green and also captures it using capturing group (...). This trick works if you want to avoid using single line flag.

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123