I am trying to parse all elements under div
using beautifulsoup the issue is that I don't know all the elements underneath the div prior to parsing. For example a div can have text data in paragraph mode and bullet format along with some href
elements. Each url that I open can have different elements underneath the specific div class that I am looking at:
example:
url a can have following:
<div class='content'>
<p> Hello I have a link </p>
<li> I have a bullet point
<a href="foo.com">foo</a>
</div>
but url b
can have
<div class='content'>
<p> I only have paragraph </p>
</div>
I started as doing something like this:
content = souping_page.body.find('div', attrs={'class': 'content})
but how to go beyond this is little confuse. I was hoping to create one string from all the parse data as a end result.
At the end I want the following string to be obtain from each example:
Example 1: Final Output
parse_data = Hello I have a link I have a bullet point
parse_links = foo.com
Example 2: Final Output
parse_data = I only have paragraph