ok so I have this HTML file that has data in it that contains many div tags and table tags. The div tags contain id's that relate to other div tags sections, but after each div tag section is a table section that contains the data I need. I want to be able to take this HTML file and create arrays, lists, dicts, etc... some sort of structure so that I can easily search for related info and extract what I need from it.
Example of whats in the HTML file.
<DIV class="info"> <A name="bc968f9fa2db71455f50e0c13ce50e871fS7f0e"
id="bc968f9fa2db71455f50e0c13ce50e871fS7f0e">
<B>WORKSPACE_WEBAPP</B> (WORKSPACE_WEBAPP)<BR/> <B>Object ID:
</B> bc968f9fa2db71455f50e0c13ce50e871fS7f0e<BR/> <B>Last
Modified Date : </B> 26-Sep-13 10:41:13<BR/>
<B>Properties:</B><BR/> </DIV>
<TABLE class="properties"> <TR class="header"><TH>Property
Name</TH><TH>Property Value</TH></TR>
<TR><TD>serverName</TD><TD>FoundationServices0</TD></TR>
<TR><TD>context</TD><TD>workspace</TD></TR>
<TR><TD>isCompact</TD><TD>false</TD></TR>
<TR><TD>AppServer</TD><TD>WebLogic 10</TD></TR>
<TR><TD>port</TD><TD>28080</TD></TR>
<TR><TD>maintVersion</TD><TD>11.1.2.2.0.66</TD></TR>
<TR><TD>version</TD><TD>11.1.2.0</TD></TR>
<TR><TD>SSL_Port</TD><TD>28443</TD></TR>
<TR><TD>instance_home</TD><TD>/essdev1/app/oracle/Middleware/user_projects/epmsystem1</TD></TR>
<TR><TD>configureBPMUIStaticContent</TD><TD>true</TD></TR>
<TR><TD>validationContext</TD><TD>workspace/status</TD></TR> </TABLE>
So I want to be able to create an array for these div sections and also contain the properties that area in the table as well within that array. I just can't wrap my head around whats the best way to do it. I know probably the answer will contain using BeautifulSoup to parse the tags. Since there is no other way to relate the table section to the div section I believe I'll have to load the file a line at a time and process it that way, unless there is an easier method? any ideas would be very helpful.