I've got to grab some product data off an existing website to put into a database. The data is all in HTML table format, the model numbers are unique, but each product can have any number of different attributes (so the tables I need to parse all have different columns and headings).
<table>
<tr>
<td>Model No.</td>
<td>Weight</td>
<td>Colour</td>
<td>Etc..</td>
</tr>
<tr>
<td>8572</td>
<td>12 Kg</td>
<td>Red</td>
<td>Blah..</td>
</tr>
<tr>
<td>7463</td>
<td>7 Kg</td>
<td>Blue</td>
<td>Blah..</td>
</tr>
<tr>
<td>8332</td>
<td>42 Kg</td>
<td>Yellow</td>
<td>Blah..</td>
</tr>
</table>
This is the CSV output format I'm looking for:
Model-No,Attribute-Name,Attribute-Value
8572,"Weight","12 Kg"
8572,"Colour","Red"
8572,"Etc","Blah.."
7463,"Weight","7 Kg"
7463,"Colour","Blue"
7463,"Etc","Blah.."
8332,"Weight","42 Kg"
8332,"Colour","Yellow"
8332,"Etc","Blah.."
As the tables all appear to be valid xhtml I'll probably load each one into an XmlDocument, but does anyone have any suggestions for a better way of accomplishing this? Thanks.