1

Hello guys I need some help. I want to scrape e-mail from this web site https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing

And I have problem with this inspected elements because email don't show in may code when i start the program:

<div class="col-xs-12">
  <div class="separator-rule heading"></div>
  <h4>Provider Main Contact</h4>
  "
                              Klaus Stephan"
  <br>
  "
                              +49++49 16091922165"
  <br>
  "
                              president@pmicc.de
                          "
</div>

Can someone know how to catch e-mail from this. Thanks for help guys.

2 Answers2

1

To get emails from block "Provider Main Contact" you can use this example:

import requests 
from bs4 import BeautifulSoup


url = 'https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

main_contact_block = soup.select_one('div:has(>h4:contains("Provider Main Contact"))')

emails = [text.strip() for text in main_contact_block.find_all(text=True) if '@' in text]
print(emails)

Prints:

['president@pmicc.de']
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

You can also use regex pattern matching for analyzing the text for an email address.

A very powerfule expression can be found in this discussion: How to validate an email address using a regular expression?

url = 'https://ccrs.pmi.org/search/course-provider/1000000396?courseID=472010&courseName=Agile%20for%20Marketing'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
pattern='(?:[a-z0-9!#$%&\'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])'
match=re.search(pattern,soup.get_text())
print(m[0])

> 'president@pmicc.de'
Marc
  • 712
  • 4
  • 7