Without more info about your data it is difficult to give you a concise solution that will cover all possible inputs. To help you on your way, here's a walkthrough which will hopefully lead you to a solution that suits your needs.
The following will give us <div id="a">
(there should only be one element with a specific id):
top_div = soup.find('div', {'id':'a'})
We can then proceed to retrieve all inner divs with class='aa'
(possible to have more than one):
aa_div = top_div.findAll('div', {'class':'aa'})
From there, we can return all links for each div found:
links = [div.findAll('a') for div in aa_div]
Note that links
contains a nested list since div.findAll('a')
will return a list of a
nodes found. There are various ways to flatten such a list.
Here's an example which iterates through the list and prints out the individual links:
>>> from itertools import chain
>>> for a in chain.from_iterable(links):
... print a
...
<a id="ff" href="#">ff</a>
<a id="gg" href="#">gg</a>
The solution presented above is rather long winded. However, with more understanding of the input data a much more compact solution is possible. For example, if the data is exactly as you've show and there will always be that one div
with class='aa'
then the solution could simply be:
>>> soup.find('div', {'class':'aa'}).findAll('a')
[<a id="ff" href="#">ff</a>, <a id="gg" href="#">gg</a>]
Using CSS selectors with BeautifulSoup4
If you're using a newer version of BeatifulSoup (version 4), you could also use the .select()
method which provides CSS selector support. The elaborate solution I provided at the beginning of this answer could be re-written as:
soup.select("div#a div.aa a")
For BeautifulSoup v3, you can add on this functionality using soupselect.
However, do note the following statement from the docs (emphasis mine):
This is a convenience for users who know the CSS selector syntax. You can do all this stuff with the Beautiful Soup API. And if CSS selectors are all you need, you might as well use lxml directly, because it’s faster. But this lets you combine simple CSS selectors with the Beautiful Soup API.