I am a beginner in python. I am currently using Beautifulsoup to scrape a website.
str='' #my_url
source = urllib.request.urlopen(str);
soup = bs.BeautifulSoup(source,'lxml');
match=soup.find('article',class_='xyz');
for paragraph in match.find_all('p'):
str+=paragraph.text+"\n"
My tag Structure -
<article class="xyz" >
<h4>dr</h4>
<p>efkl</p>
<h4>dr</h4>
<p>efkl</p>
<h4>dr</h4>
<p>efkl</p>
<h4>dr</h4>
<p>efkl</p>
</article>
I am getting output like this (as I am able to extract the paragraphs) -
efkl
efkl
efkl
efkl
Output I want ( I want the headings as well as the paragraphs) -
dr
efkl
dr
efkl
dr
efkl
dr
efkl
I want my output to also contains headings along with paragraphs.How to modify code in such a way that it contains header before paragraphs (Like in original HTML) .