I am trying to retrieve some information about a website, I want to look for a specific tag/class and then return the contained text value (innerHTML). This is what I have so far
$request = Invoke-WebRequest -Uri $url -UseBasicParsing
$HTML = New-Object -Com "HTMLFile"
$src = $request.RawContent
$HTML.write($src)
foreach ($obj in $HTML.all) {
$obj.getElementsByClassName('some-class-name')
}
I think there is a problem with converting the HTML into the HTML object, since I see a lot of undefined properties and empty results when I'm trying to "Select-Object" them.
So after spending two days, how am I supposed to parse HTML with Powershell?
- I can't use
IHTMLDocument2
methods, since I don't have Office installed (Unable to use IHTMLDocument2) - I can't use the
Invoke-Webrequest
without-UseBasicParsing
since the Powershell hangs and spawns additional windows while accessing the ParsedHTML property (parsedhtml doesnt respond anymore and Using Invoke-Webrequest in PowerShell 3.0 spawns a Windows Security Warning)
So since parsing HTML with regex is such a big no-no, how do I do it otherwise? Nothing seems to work.