0

I want to use Powershell to search in .html documents for specific strings and print them out.

Let me explain my first function which works: I use this function to search for all .html documents in the path which contain the string "Tag". After that I search for the string "ID:", skip the tag "</TD><TD>" and use the following regular expression to print out the following 32 characters, which is the ID. Below you see a part of the html file and then my function.

<TR VALIGN=TOP><TD>Lokation:</TD><TD>\Test1\blabla\asdf\1234\WS Auswertungen</TD></TR>
<TR VALIGN=TOP><TD>Beschreibung:</TD><TD></TD></TR>
<TR VALIGN=TOP><TD>Eigentümer:</TD><TD><IMG ALIGN=MIDDLE SRC="file:///C:\Users\D0262290\AppData\Local\Temp\23\User.bmp">&nbsp;Wilmes, Tanja</TD></TR>
<TR VALIGN=TOP><TD>ID:</TD><TD>55C7B7F411E2661E001000806C38EBA0</TD></TR>
</TABLE></TD><TD><IMG ALIGN=MIDDLE SRC="file:///C:\Users\D0262290\AppData\Local\Temp\23\User.bmp">&nbsp;

The function:

Function searchStringID {
    Get-ChildItem -Path C:\Users\blub\lala\Dokus -Filter *.html |
    Select-String -Pattern "Tag" |
    select Path |
    Get-ChildItem |
    foreach {
        if ((Get-Content -Raw -Path $_.FullName) -replace "<.*?>|\s" -match "(?s)ID:(?<Id>[a-z0-9]{32})" ) {

            printToOutputLog
        }
   }
}

All this works fine.

Now I need to check for 2 more information and I can't figure out the regular expression I have to use because it has no fixed length of characters. I always have to check for the string "Tag" in my problems below.

My first problem: I have get the location of the file, so I gotta search for the string "Lokation:" (you can check it on the html I posted before). So get the information I have have to skip the tags </TD><TD> again and use a regular expression to get the location. My problem here is that I have to idea how to manage the not-fixed length of characters. Is there a way to print out the characters between "Lokation:</TD><TD>" and "</TD></TR>" ? The tags are all the same in the other html files, so I just need a solution which works for my example.

My second problem: I have to read out the object's name. In the html document it's stored like this in a comment. The object's name begins after "[OBJECT:] and ends with "]". Here again, I can't figure out which expression I could use. The special characters in the example object's name below could be used.

 <!-- ################################################################## -->
 <!-- #  [OBJECT: NAME BLA bla/ BLA_BLA 1 22:34]  # -->
 <!-- ################################################################## -->

I would be so thankful if anyone could help me. Every hint is useful to me because my brain is really stuck here. Thanks and cheers

Elaice
  • 3
  • 2

1 Answers1

0

Ok, this one gets the contents of each file and runs each line through a Switch to match against three RegEx expressions. It worked for me against your sample data. It assigns each match to a variable for each of the three things you are looking for, and then outputs an object for each.

Function searchStringID {
    Get-ChildItem -Path C:\Users\blub\lala\Dokus -Filter *.html |
    Select-String -Pattern "Tag" |
    select Path |
    Get-ChildItem |
    foreach {
        Switch -Regex (Get-Content -Path $_.FullName){
             "((?<=ID:.+?)[a-z0-9]{32})" {$ID = $Matches[1]}
             "Lokation:.+?>(\\[^<]+)"  {$Location = $Matches[1]}
             "OBJECT: ?([^\]]+)"       {$Object = $Matches[1]}
        }
        [PSCustomObject][Ordered]@{
            'ID' = $ID
            'Location' = $Location
            'Name' = $Object
        }
    }
}

So then you could assign that to a variable and have an array of results to do with as you please (output to CSV? Sure! Display to the screen as a table? Can do! Email to the entire company? Um, yeah, but I wouldn't recommend that.)

Here's what it gave me when I ran it against your sample:

ID                                                                     Location                                                               Name                                                                 
--                                                                     --------                                                               ----                                                                 
55C7B7F411E2661E001000806C38EBA0                                       \Test1\blabla\asdf\1234\WS Auswertungen                                NAME BLA bla/ BLA_BLA 1 22:34
TheMadTechnician
  • 34,906
  • 3
  • 42
  • 56
  • Thank you! Couldn't image that the solution would be so simple. That's exactly the output I wanted. – Elaice Sep 11 '14 at 11:41