I have the following html table:
<!DOCTYPE html>
<html>
<head>
<style>
table {
font-family: arial, sans-serif;
border-collapse: collapse;
width: 100%;
}
td, th {
border: 1px solid #dddddd;
text-align: left;
padding: 8px;
}
tr:nth-child(even) {
background-color: #dddddd;
}
</style>
</head>
<body>
<h2>HTML Table</h2>
<table>
<tr>
<th>Company</th>
<th>Contact</th>
<th>Country</th>
</tr>
<tr>
<td>Alfreds Futterkiste</td>
<td>Maria Anders</td>
<td>Germany</td>
</tr>
<tr>
<td>Centro comercial Moctezuma</td>
<td>Francisco Chang</td>
<td>Mexico</td>
</tr>
<tr>
<td>Ernst Handel</td>
<td>Roland Mendel</td>
<td>Austria</td>
</tr>
<tr>
<td>Island Trading</td>
<td>Helen Bennett</td>
<td>UK</td>
</tr>
<tr>
<td>Laughing Bacchus Winecellars</td>
<td>Yoshi Tannamuri</td>
<td>Canada</td>
</tr>
<tr>
<td>Magazzini Alimentari Riuniti</td>
<td>Giovanni Rovelli</td>
<td>Italy</td>
</tr>
</table>
</body>
I would like to match all occurrences of <th>table headers</th>
and <td>table data</td>
.
For the <td>table data</td>
I have managed to invoke a webrequest, got the html file and am now in the process of extracting the table contents:
$Table = $Data.Content
$NumberOfColumns = ($Table | Select-String "<th>" -AllMatches).Matches.Count
$NumberOfRows = ($Table | Select-String "<td>" -AllMatches).Matches.Count
$AllMatches = @()
$Found = $Table -match "(?<=<td>)[a-zA-Z0-9 _-]{1,99}(?=</td>)"
ForEach ($Row in $NumberOfRows)
{
If ($Found -eq $True)
{
$AllMatches += $Matches
}
}
$AllMatches
I get this output:
Name Value
---- -----
0 Alfreds Futterkiste
I would like to get a list of all of the matches embedded in th
and td
(I am running Powershell Core 6.2, so the ParsedHtml
method is not an option. I would like to parse the table manually).
Any suggestions are greatly appreciated.