0

I am trying to process the results of a web query. I am doing

$data = Invoke-WebRequest $uri $data.RawContent

which returns this

HTTP/1.1 200 OK Transfer-Encoding: chunked Expires: Thu, 01 Jan 1970 00:00:00 GMT Set-Cookie: JSESSIONID=19gbg0sejwxXXXXXXpzwlasaaeer;Path=/informer;Secure Server: Jetty(8.1.8.v20121106) <table><tr><td>username</td><td>firstname</td><td>lastname</td><td>idnumber</td><td>auth</td><td>course1</td></tr><tr valign="top"><td style="mso-number-format:'\@';">d_smith</td><td style="mso-number-format:'\@';">Daniel</td><td style="mso-number-format:'\@';">smith</td><td style="mso-number-format:'\@';">2221840</td><td style="mso-number-format:'\@';">ldap</td><td style="mso-number-format:'\@';">2017CE1-CTP-8001-101</td></tr><tr valign="top"><td style="mso-number-format:'\@';">d_rutherford</td><td style="mso-number-format:'\@';">Daniel</td><td style="mso-number-format:'\@';">rutherford</td><td style="mso-number-format:'\@';">2223039</td><td style="mso-number-format:'\@';">ldap</td><td style="mso-number-format:'\@';">2017CE1-CTP-8001-101</td></tr><tr valign="top"><td style="mso-number-format:'\@';">j_smithe21</td><td style="mso-number-format:'\@';">James</td><td style="mso-number-format:'\@';">smithe</td><td style="mso-number-format:'\@';">2221844</td><td style="mso-number-format:'\@';">ldap</td><td style="mso-number-format:'\@';">2017CE1-CTP-8001-101</td></tr><tr valign="top"><td style="mso-number-format:'\@';">h_phillipsiii</td><td style="mso-number-format:'\@';">Harvey</td><td style="mso-number-format:'\@';">phillips III</td><td style="mso-number-format:'\@';">2221845</td><td style="mso-number-format:'\@';">ldap</td><td style="mso-number-format:'\@';">2017CE1-CTP-8001-101</td></tr><tr valign="top"><td style="mso-number-format:'\@';">h_phillipsiii</td><td style="mso-number-format:'\@';">Harvey</td><td style="mso-number-format:'\@';">phillips III</td><td style="mso-number-format:'\@';">2221845</td><td style="mso-number-format:'\@';">ldap</td><td style="mso-number-format:'\@';">2017CE1-CTP-8001-102</td></tr><tr valign="top"><td style="mso-number-format:'\@';">d_rutherford</td><td style="mso-number-format:'\@';">Daniel</td><td style="mso-number-format:'\@';">rutherford</td><td style="mso-number-format:'\@';">2223039</td><td style="mso-number-format:'\@';">ldap</td><td style="mso-number-format:'\@';">2017CE1-CTP-8001-102</td></tr><tr valign="top"><td style="mso-number-format:'\@';">j_smithe21</td><td style="mso-number-format:'\@';">James</td><td style="mso-number-format:'\@';">smithe</td><td style="mso-number-format:'\@';">2221844</td><td style="mso-number-format:'\@';">ldap</td><td style="mso-number-format:'\@';">2017CE1-CTP-8001-102</td></tr><tr valign="top"><td style="mso-number-format:'\@';">d_smith</td><td style="mso-number-format:'\@';">Daniel</td><td style="mso-number-format:'\@';">smith</td><td style="mso-number-format:'\@';">2221840</td><td style="mso-number-format:'\@';">ldap</td><td style="mso-number-format:'\@';">2017CE1-CTP-8001-102</td></tr><tr valign="top"><td></td></tr></table>

notice there is no HTML or body tags. As best I can figure, this is why I am not seeing the ParsedHTML option. These are the only properties $data has are shown below $data Properties.

$data.Content returns the character codes to make up the text $data.RawContent produces. This is the table when it is rendered in a web browser rendered table

How can I convert this table to a CSV or PSCustomObject or something similar so that I can iterate through the lines and filter and process the data?

PsychoData
  • 1,198
  • 16
  • 32
  • You may take a look at [JohnLBevan](https://stackoverflow.com/users/361842/johnlbevan)s [answer](https://stackoverflow.com/a/31949668/6811411) in a [similar question](https://stackoverflow.com/questions/25918094/how-to-convert-html-table-to-csv-file-with-same-structure-with-powershell). –  Jul 09 '17 at 20:09

1 Answers1

1

Since the response is valid XML, you can use PowerShells XML parsing ability, like this:

$data = Invoke-WebRequest $uri
[xml]$x = $data.RawContent

$first = $true
$OldOFS = $OFS
$OFS = '","'

foreach ($tr in $x.table.tr)
{
    if ($first)
    {
        $first = $false
        """$($tr.td)""" | Out-File "out.csv"
    }
    elseif ($tr.td)
    {
        """$($tr.td.'#text')""" | Out-File "out.csv" -Append
    }
}

$OFS = $OldOFS

notepad out.csv

The $OFS hack is just quick and dirty to get the arrays converted to something csv-like. You could for example also use the -join operator.

The $tr.td in the first row produces an array of header strings, the $tr.td.'#text' produces the array of values for each remaining, non-empty row.

Hope that helps.

TToni
  • 9,145
  • 1
  • 28
  • 42
  • YAAASSSSSSSSSSS – PsychoData Jul 09 '17 at 20:47
  • When I put in the table, I wasn't 100% correct. there was some additional HTML headers and such before it. https://i.imgur.com/ixc7X7O.png I took the lazy way and removed the content before `` with `[xml]$x = $data.RawContent.Remove(0,$data.RawContent.IndexOf("
    ") -1 ) ` The rest was just like you had, but your answer is 100% correct before the edit
    – PsychoData Jul 09 '17 at 20:49