-3

Ok, so I have dozens of html files full of website source code that I need to scrape to find names and email addresses.

The code has hundreds of lines which look like this:

              <ul class="specialfaa-results">

                        <li >
                            <div class="summary-heading">
                                <h3 class="adviser-name">Mr Joe Bloggs </h3><p class="distance">0.1mi</p>
                                <div class="clearboth"></div>
                                <p class="adviser-company mod-content">Joe Bloggs Company Ltd</p>
                            </div>


                            <div class="full-profile mg-tp-10" style="display:none; margin-left:3px;">
                                <div class="mod-content">

                                    <div class="fl-lf yui3-u-1-3">
                                                  <div class="yui3-u adv-item adv-map">
                                                      <a href="#mapcontainer" class="showGoogle" lng="-1.9111053" lat="52.4771906" title="Business">

                                                      </a>
                                                  </div>
                                    </div>

                                    <div class="fl-lf yui3-u-2-5">
                                            <div class="yui3-u adv-item adv-email">
                                                <a href="mailto:joe.bloggs@hello.co.uk">mailto:joe.bloggs@hello.co.uk</a>
                                            </div>
                                        <div class="yui3-u adv-item adv-webpage">
                                            <a href="http://www.joebloggs.co.uk" 

My thinking is that I need to isolate the names and email addresses using Python or perhaps excel. I intend to have these names and email addresses finally in an excel document with headings 'Name' ('Joe Bloggs') and 'email address' (joe.bloggs@hello.co.uk). What kind of code or process should I use to get these?

Thanks guys! Fairly new to this kind of thing and site so any help would be hugely appreciated.

Hugh.

Hugh
  • 15
  • 1
    Hi Hugh, Welcome, before you start with your question, do read [How to Ask](http://stackoverflow.com/help/how-to-ask). I suggest you first search online and then come if you get stuck. You should also provide details of what and all you have tried before asking this question and what didn't work for you. – Nagaraj Tantri Jun 09 '15 at 13:31
  • Try to extract email with regex http://stackoverflow.com/questions/28888194/extract-emails-from-html-using-regex – Dmitrij Holkin Jun 09 '15 at 13:57

1 Answers1

0

Try to extract email with regex

Extract emails from html using regex

https://gist.github.com/dideler/5219706

Community
  • 1
  • 1
Dmitrij Holkin
  • 1,995
  • 3
  • 39
  • 86