0

I want to get every email address of 1000 webpages using Python's Selenium.

My idea:

go to page x

a = driver.page_source

get the text of a that contains @

But however I cant get that part from a.

1 Answers1

1

You can get a list of the links this way:

links = [elem.get_attribute('href') for elem in elems]

where elems is a driver.find_elements_by_...() returned value, for example:

elem = driver.find_elements_by_css_selector('a') # You need <a> tags if you want to be sure to find href attribute

You can check if it's an email this way:

def isMail(link: str):
    if ('mailto:' in link):
        return True
    return False

So

mails = [link.removeprefix('mailto:') for link in links if isMail(link)]

I would suggest to read also this and this.

FLAK-ZOSO
  • 3,873
  • 4
  • 8
  • 28