I want to extract values from a html document and in another program (ui.vision / selenium) I can do it with xpath statements. I have worked out a whole lot of working xpaths, and now I want to use them in Powershell. I have the string $html containing everything from <html>
to </html>
(incl.).
As far as I have researched, I need to have an xml object to use 'Select-Xml' with xpath statements.
In order to convert $html to xml I tried to cast:
[xml]$xml = $html
as well as
$xml = [xml]$html
and I tried to convert:
$html = $html | ConvertTo-xml
All failed. I think that the html needs to be very well-formatted, but it is not (even if it's perfect html and passes the W3 validator without warnings). It's minified and most attributes lack parentheses.
So how can I get xpath to work on a string containing a html website? I am about to resort to regular expressions, but it seems to be a lot of work to translate all the xpath statements.