I've been working on this small piece for hours now and couldn't find a solution, and it should be simple. This time, I'll post the actual code, and not simple examples, as somehow I can't get the examples to work with the real code.
I'm trying to do this with built-in modules (though if you have the answer using bs4 I'd like to know it as well). It should be a simple thing.
I have two files, an HTML file that goes like this.
<b>Match #139</b></font></td></tr><tr bgcolor="#EEEEEE"><td align="CENTER" width="10%"><font color="Green" face="Tahoma,Arial" size="2"><b>Yes</b></font></td><td nowrap=""> <font face="Tahoma,Arial" size="2"><a href="http://www.bricklink.com/catalogItem.asp?P=3822pb01">3822pb01</a> </font></td><td><font face="Tahoma,Arial" size="2"><b>Door 1 x 3 x 1 Left with 'POLICE' Pattern</b></font><font class="fv"><br><a href="http://www.bricklink.com/catalog.asp">Catalog</a>: <a href="http://www.bricklink.com/catalogTree.asp?itemType=P">Parts</a>: <a href="http://www.bricklink.com/catalogList.asp?catType=P&catID=642">Door, Decorated</a></font></td><td nowrap=""><font class="fv"> </font></td></tr><tr bgcolor="#FFFFFF"><td align="CENTER" width="10%"><font color="Green" face="Tahoma,Arial" size="2"><b>Yes</b></font></td><td nowrap=""> <font face="Tahoma,Arial" size="2"><a href="http://www.bricklink.com/catalogItem.asp?P=3821pb01">3821pb01</a> </font></td><td><font face="Tahoma,Arial" size="2"><b>Door 1 x 3 x 1 Right with 'POLICE' Pattern</b></font><font class="fv"><br><a href="http://www.bricklink.com/catalog.asp">Catalog</a>: <a href="http://www.bricklink.com/catalogTree.asp?itemType=P">Parts</a>: <a href="http://www.bricklink.com/catalogList.asp?catType=P&catID=642">Door, Decorated</a></font></td><td nowrap=""><font class="fv"> </font></td></tr><tr bgcolor="#5E5A80"><td colspan="4"><font face="Tahoma,Arial" size="2" color="#FFFFFF"> <b>Match #140</b></font></td></tr><tr bgcolor="#EEEEEE"><td align="CENTER" width="10%"><font color="Green" face="Tahoma,Arial" size="2"><b>Yes</b></font></td><td nowrap=""> <font face="Tahoma,Arial" size="2"><a href="http://www.bricklink.com/catalogItem.asp?P=3822pb02">3822pb02</a> </font></td><td><font face="Tahoma,Arial" size="2"><b>Door 1 x 3 x 1 Left with Classic Fire Logo Pattern</b></font><font class="fv"><br><a href="http://www.bricklink.com/catalog.asp">Catalog</a>: <a href="http://www.bricklink.com/catalogTree.asp?itemType=P">Parts</a>: <a href="http://www.bricklink.com/catalogList.asp?catType=P&catID=642">Door, Decorated</a></font></td><td nowrap=""><font class="fv"> </font></td></tr><tr bgcolor="#FFFFFF"><td align="CENTER" width="10%"><font color="Green" face="Tahoma,Arial" size="2"><b>Yes</b></font></td><td nowrap=""> <font face="Tahoma,Arial" size="2"><a href="http://www.bricklink.com/catalogItem.asp?P=3821pb02">3821pb02</a> </font></td><td><font face="Tahoma,Arial" size="2"><b>Door 1 x 3 x 1 Right with Classic Fire Logo Pattern</b></font><font class="fv"><br><a href="http://www.bricklink.com/catalog.asp">Catalog</a>: <a href="http://www.bricklink.com/catalogTree.asp?itemType=P">Parts</a>: <a href="http://www.bricklink.com/catalogList.asp?catType=P&catID=642">Door, Decorated</a></font></td><td nowrap=""><font class="fv"> </font></td></tr><tr bgcolor="#5E5A80"><td colspan="4"><font face="Tahoma,Arial" size="2" color="#FFFFFF"> <b>
Please don't kill me, yes, it's only a line. You can paste it into some code editor to see it in multiple lines. The file continues with more "Matches".
I want to do two things.
1st, I want to create a dictionary that will use the match number as it's index number. So, for example, it would be
matches = {'139' : 'etc', '140' : 'etc'}
And then, if you look at the HTML, after the first link after the Match, there is a part number, in example, the first one is 3822pb01. There are usually 2 part numbers inside a match, and I want to create a tuple inside the dict with those 2 part numbers.
matches = {'139' : ['3822pb01', '3821pb01'], '140' : ['3822pb02', 3821pb02]}
So far, I have been able to strip out the part numbers, or the Match #'s, but not correlate the part #'s and the Match #'s.
Could someone help me approach this? - it runs a little away from my current knowledge.
Here's the full HTML file - http://pastebin.com/raw.php?i=eWWh4XfM - HTML doesn't have the best formatting