I have got an HTML file and I read with Python and I would like to while I print customize it.
First I've to print Country name then players name which they belong to their country.
My HTML file looks like this:
<ul>
<li>
Australia
<ol>
<li>Steve Smith</li>
<li>David Warner</li>
<li>Aaron Finch</li>
</ol>
</li>
<li>
Bangladesh
<ol>
<li>Shakib Al Hasan</li>
<li>Tamim Iqbal</li>
<li>Mushfiqur Rahim</li>
</ol>
</li>
<li>
England
<ol>
<li>Ben Stokes</li>
<li>Joe Root</li>
<li>Eoin Morgan</li>
</ol>
</li>
Now I want to scrape this data from my HTML file:
Australia - Steve Smith, David Warner, Aaron Finch
Bangladesh - Shakib Al Hasan, Tamim Iqbal, Mushfiqur Rahim
England - Ben Stokes, Joe Root, Eoin Morgan
But I can only scrape with Players' name. This is my code:
import re
file_name = "team.html"
mode = "r"
with open(file_name, mode) as fp:
team = fp.read()
pat = re.compile(r'<li>(.*?)</li>')
result = pat.findall(team)
res = ", ".join([str(player) for player in result])
print(res)
Also, I don't' use any package like bs4. I would like to solve this issue by using regex.