So the webpage that I'm trying to scrape is something like this:
...
<tr><td colspan=3><BR><div class="list">Foo:</div></td></tr>
<tr><td><img src="/images/2.gif" alt="Main"> <a href="/foo/1/"></A></td><td><a href="/browse/foo/1/">foo1</A></td></tr>
<tr><td><img src="/images/2.gif" alt="Main"> <a href="/foo/2/"></A></td><td><a href="/browse/foo/2/">foo2</A></td></tr>
<tr><td><img src="/images/1.gif" alt="Guest"> <a href="/foo/3/"></A></td><td><a href="/browse/foo/3/">foo3</A></td></tr>
<tr><td colspan=3><BR><div class="list">Bar:</div></td></tr>
<tr><td><img src="/images/1.gif" alt="Guest"> <a href="/bar/1/"></A></td><td><a href="/browse/bar/1/">bar1</A></td></tr>
<tr><td><img src="/images/1.gif" alt="Guest"> <a href="/bar/2/"></A></td><td><a href="/browse/bar/2/">bar2</A></td></tr>
<tr><td><img src="/images/2.gif" alt="Main"> <a href="/bar/3/"></A></td><td><a href="/browse/bar/3/">bar3</A></td></tr>
<tr><td colspan=3>...
And I would like to scrape data as following:
...
Foo:
foo1
foo2
foo3
Bar:
bar1
bar2
bar3
...
Each contents are separated with <tr>
with inscribed <td colspan=3>
which make it difficult for me to scrape information... Though I have tried this method I was unable to obtain data since all groups share common <tr>
tag.
Would there be rational way to divide those sections using beautifulsoup? Thanks in advance.