Ok, so I have dozens of html files full of website source code that I need to scrape to find names and email addresses.
The code has hundreds of lines which look like this:
<ul class="specialfaa-results">
<li >
<div class="summary-heading">
<h3 class="adviser-name">Mr Joe Bloggs </h3><p class="distance">0.1mi</p>
<div class="clearboth"></div>
<p class="adviser-company mod-content">Joe Bloggs Company Ltd</p>
</div>
<div class="full-profile mg-tp-10" style="display:none; margin-left:3px;">
<div class="mod-content">
<div class="fl-lf yui3-u-1-3">
<div class="yui3-u adv-item adv-map">
<a href="#mapcontainer" class="showGoogle" lng="-1.9111053" lat="52.4771906" title="Business">
</a>
</div>
</div>
<div class="fl-lf yui3-u-2-5">
<div class="yui3-u adv-item adv-email">
<a href="mailto:joe.bloggs@hello.co.uk">mailto:joe.bloggs@hello.co.uk</a>
</div>
<div class="yui3-u adv-item adv-webpage">
<a href="http://www.joebloggs.co.uk"
My thinking is that I need to isolate the names and email addresses using Python or perhaps excel. I intend to have these names and email addresses finally in an excel document with headings 'Name' ('Joe Bloggs') and 'email address' (joe.bloggs@hello.co.uk). What kind of code or process should I use to get these?
Thanks guys! Fairly new to this kind of thing and site so any help would be hugely appreciated.
Hugh.