I want to use Powershell to search in .html documents for specific strings and print them out.
Let me explain my first function which works:
I use this function to search for all .html documents in the path which contain the string "Tag". After that I search for the string "ID:", skip the tag "</TD><TD>"
and use the following regular expression to print out the following 32 characters, which is the ID. Below you see a part of the html file and then my function.
<TR VALIGN=TOP><TD>Lokation:</TD><TD>\Test1\blabla\asdf\1234\WS Auswertungen</TD></TR>
<TR VALIGN=TOP><TD>Beschreibung:</TD><TD></TD></TR>
<TR VALIGN=TOP><TD>Eigentümer:</TD><TD><IMG ALIGN=MIDDLE SRC="file:///C:\Users\D0262290\AppData\Local\Temp\23\User.bmp"> Wilmes, Tanja</TD></TR>
<TR VALIGN=TOP><TD>ID:</TD><TD>55C7B7F411E2661E001000806C38EBA0</TD></TR>
</TABLE></TD><TD><IMG ALIGN=MIDDLE SRC="file:///C:\Users\D0262290\AppData\Local\Temp\23\User.bmp">
The function:
Function searchStringID {
Get-ChildItem -Path C:\Users\blub\lala\Dokus -Filter *.html |
Select-String -Pattern "Tag" |
select Path |
Get-ChildItem |
foreach {
if ((Get-Content -Raw -Path $_.FullName) -replace "<.*?>|\s" -match "(?s)ID:(?<Id>[a-z0-9]{32})" ) {
printToOutputLog
}
}
}
All this works fine.
Now I need to check for 2 more information and I can't figure out the regular expression I have to use because it has no fixed length of characters. I always have to check for the string "Tag" in my problems below.
My first problem:
I have get the location of the file, so I gotta search for the string "Lokation:" (you can check it on the html I posted before).
So get the information I have have to skip the tags </TD><TD>
again and use a regular expression to get the location. My problem here is that I have to idea how to manage the not-fixed length of characters. Is there a way to print out the characters between "Lokation:</TD><TD>
" and "</TD></TR>"
?
The tags are all the same in the other html files, so I just need a solution which works for my example.
My second problem: I have to read out the object's name. In the html document it's stored like this in a comment. The object's name begins after "[OBJECT:] and ends with "]". Here again, I can't figure out which expression I could use. The special characters in the example object's name below could be used.
<!-- ################################################################## -->
<!-- # [OBJECT: NAME BLA bla/ BLA_BLA 1 22:34] # -->
<!-- ################################################################## -->
I would be so thankful if anyone could help me. Every hint is useful to me because my brain is really stuck here. Thanks and cheers