I am trying to read a file and ignore everything up until a character match. Sometimes the character match will appear on the same line with the results I need, so I can't do a Select-Object -Skip x
where x
is the number of lines returned from a document.
I have tried to use the .Split('<pre>')
method on the results, and that worked, but I can't select the index because it's a multi-line string that returned.
Below is the start of an example of text returning. It's a HTML response that I'm trying to read the data out of. I cannot use the Content
as it's in ByteArray and has a space between every character. So I've concluded it's time to ask for help with [Regex]
in PowerShell to assist.
I was looking at this answer and thought I could use /.+?(?=abc)/
by means of replacing the search string like this:
(Get-Content $env:TEMP\test.txt) | ForEach-Object {
[Regex]::Match($_, "^.+(?=\<pre\>)").Value
}
That didn't work either. I'm OK with regex when looking for match like {\d\d\d}
to ensure it's 3 digits long, but I'm not sure how to use it in this instance.
This is the start of a file being returned. I need to ignore everything up to and including the characters <pre>
and then anything after that to the end of the file is OK.
Example command and result being returned here:
PS> Get-Content $env:TEMP\test.txt
HTTP/1.1 200 OK
Content-Length: 3524
Date: Thu, 18 Jun 2020 15:00:05 GMT
Last-Modified: Fri, 19 Jun 2020 01:00:05 GMT
Server: TTWS/1.2 on Microsoft-HTTPAPI/2.0
<!doctype html><html><body>
<p>Test TCP WebServer 1.2</p>
<pre>
Directory: C:\tmp
EDIT:
I have this now, which removes everything up to and including the first <pre>
tag and also removes the closing </pre>
tag, but won't remove anything AFTER the closing </pre>
tag.
(Get-Content $env:TEMP\test.txt -Raw) -replace '(?s)^.*?<pre>' -replace '<\/pre>(.+?)'
Can that be expanded to include to the end of the file?