1

I'm using the following Powershell script that calls an API that returns an HTML table. I need to convert the column of data with the header 'DBName' to an array that I can use later in the script.

I have tried using the following and nothing seems to work.

$req = Invoke-WebRequest $url
$table = $req.ParsedHtml.getElementsByTagName('table')

Here is the Powershell script I am using to call the API and then set the results to the variable "$DBList"

$SomeServer = 'databaseserver.domain.com,1433'

$PsCredential = New-Object System.Management.Automation.PSCredential -ArgumentList ($username, $password)

$API_CALL = @{
Method = "GET"
URI    = "HTTPS://software.domain.com:111/TEST1/DATA/Stuff?Target=$SomeServer&Query_Name=sys_databases&Technology=SQLServer"
Credential = $PsCredential
}

#Get the webpage
$HTML = Invoke-RestMethod @API_CALL -verbose 

<#

Something converts the HTML table to a PS object that I can extract the database names

#>


$DBList = $HTML | Select-Object -Property name | Where-Object name -NotMatch "master|model|msdb|tempdb" | Sort-Object -Property name

In Powershell, if you select $HTML, this is what is generated when I call the API:

<!DOCTYPE html>
    <html>
    <head>
        <link rel="icon" href="data:,">
        <meta name="google" content="notranslate">        
    </head> 
    <title>"Results</title>
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <style>
        /* W3.CSS 4.13 June 2019 by Jan Egil and Borge Refsnes */
        html{box-sizing:border-box}*,*:before,*:after{box-sizing:inherit}
        /* Extract from normalize.css by Nicolas Gallagher and Jonathan Neal git.io/normalize */
        .w3-table,.w3-table-all{border-collapse:collapse;border-spacing:0;width:100%;display:table}.w3-table-all{border:1px solid #ccc}
        .w3-bordered tr,.w3-table-all tr{border-bottom:1px solid #ddd}.w3-striped tbody tr:nth-child(even){background-color:#f1f1f1}
        .w3-table-all tr:nth-child(odd){background-color:#fff}.w3-table-all tr:nth-child(even){background-color:#f1f1f1}
        .w3-hoverable tbody tr:hover,.w3-ul.w3-hoverable li:hover{background-color:#ccc}.w3-centered tr th,.w3-centered tr td{text-align:center}
        .w3-table td,.w3-table th,.w3-table-all td,.w3-table-all th{padding:8px 8px;display:table-cell;text-align:left;vertical-align:top}
        .w3-table th:first-child,.w3-table td:first-child,.w3-table-all th:first-child,.w3-table-all td:first-child{padding-left:16px}
        .container2 {
            position: relative;
            width: 99%;
            height: 100%;
            overflow: hidden;
            padding-top: 37%; /* 16:9 Aspect Ratio */
        }
        .responsive-iframe2 {
            position: absolute;
            top: 1%;
            left: 0;
            bottom: 0;
            right: 0;
            width: 99.5%;
            height: 98%;
            border: none;
        } 
    </style>
    <body>
    <div style="overflow-x:auto;">
        <table class="w3-table-all w3-hoverable">
            <thead>
                <tr class="w3-light-grey">
                    <th>collation_name</th>
                    <th>compatibility_level</th>
                    <th>ConnectionString</th>
                    <th>DBname</th>
                </tr>
            </thead>        
            <tr> 
                <td>SQL_Latin1_General_CP1_CI_AS</td>
                <td>140</td>
                <td>server.domain.com,1433</td>
                <td>Database_1</td>
            </tr>  
            <tr> 
                <td>SQL_Latin1_General_CP1_CI_AS</td>
                <td>140</td>
                <td>server.domain.com,1433</td>
                <td>Database_2</td>
            </tr>  
            <tr> 
                <td>SQL_Latin1_General_CP1_CI_AS</td>
                <td>140</td>
                <td>server.domain.com,1433</td>
                <td>Database_3</td>
            </tr>  
         </table>
    </div>
        <p></p>      
    </body>
</html> 

At the end, I would like for the variable $DBList to show: Database_1 Database_2 Database_3

Jeremy F.
  • 1,778
  • 11
  • 51
  • 86

2 Answers2

0

It seems that all parsing done in Invoke-WebRequest somehow depends on Internet Explorer. I think we all agree that was a bad idea, and I am not surprised it's broken (doesn't work for me either, don't ask me why ... there are already other posts about this.)

The best way is probably to use some HTML-parsing library. I like HTML Agility Pack.

However, I was able to create an all .NET/PowerShell version based on this post that uses the Windows Forms WebBrowser:

$HTML = Invoke-RestMethod @API_CALL -Verbose 
Add-Type -AssemblyName System.Windows.Forms
$wb = New-Object System.Windows.Forms.WebBrowser
$wb.ScriptErrorsSuppressed = $true
$wb.DocumentText = ""
$doc = $wb.Document.OpenNew($true)
$doc.Write($HTML)
($DBList = $doc.GetElementsByTagName("table").GetElementsByTagName("tr") | foreach {
    $_.GetElementsByTagName("td") | select -First 1 -Skip 3 -ExpandProperty InnerText
})
marsze
  • 15,079
  • 5
  • 45
  • 61
0

There are a few ConvertFrom-HtmlTable, Get-HtmlTable, etc. cmdlets around but found them somewhat limited with regards to case insensitive HTML tags and embedded table tags (as in your case: a <tr> tag in a <thead> tag).
Therefore, I decided to created my own Read-HtmlTable:

Function Read-HtmlTable {
    [CmdletBinding()][OutputType([Object[]])] param(
        [Parameter(ValueFromPipeLine = $True, Mandatory = $True)][String]$html
    )
    Begin {
        Function Get-TopElements { # Get elements by (insensitive) name that do not have a (grand)parent with the same name
            [CmdletBinding()][OutputType([Xml.XmlElement[]])] param(
                [Parameter(Mandatory = $True)][String]$Name,
                [Parameter(Mandatory = $True, ValueFromPipeLine = $True)][Xml.XmlElement]$Element
            )
            if ($Element.Name -eq $Name) { $Element }
            else { $Element.SelectNodes('*') | Foreach-Object { $_ | Get-TopElements $Name } } 
        }
    }
    Process {
        $body = ([Xml]($html -Replace '^[\s\S]*(?=\<body\>)' -Replace '(?<=\<\/body\>)[\s\S]*$')).body
        foreach($table in ($body | Get-TopElements 'table')) {
            $Names = $Null
            foreach ($tr in ($table | Get-TopElements 'tr')) {
                if (!$Names) { $Names = ($tr | Get-TopElements 'th').'#text' }
                if (!$Names) { $Names = ($tr | Get-TopElements 'td').'#text' }
                else {
                    $Values = ($tr | Get-TopElements 'td').'#text'
                    $Properties = @{}; $i = 0
                    Foreach ($Value in $Values) { $Properties[$Names[$i++]] = $Value }
                    [pscustomobject]$Properties
                }
            }
        }
    }
}

$HTML | Read-HtmlTable

collation_name               compatibility_level DBname     ConnectionString
--------------               ------------------- ------     ----------------
SQL_Latin1_General_CP1_CI_AS 140                 Database_1 server.domain.com,1433
SQL_Latin1_General_CP1_CI_AS 140                 Database_2 server.domain.com,1433
SQL_Latin1_General_CP1_CI_AS 140                 Database_3 server.domain.com,1433
iRon
  • 20,463
  • 10
  • 53
  • 79