I want to get every email address of 1000 webpages using Python's Selenium.
My idea:
go to page x
a = driver.page_source
get the text of a
that contains @
But however I cant get that part from a
.
I want to get every email address of 1000 webpages using Python's Selenium.
My idea:
x
a = driver.page_source
a
that contains @
But however I cant get that part from a
.
You can get a list of the links this way:
links = [elem.get_attribute('href') for elem in elems]
where elems
is a driver.find_elements_by_...()
returned value, for example:
elem = driver.find_elements_by_css_selector('a') # You need <a> tags if you want to be sure to find href attribute
You can check if it's an email this way:
def isMail(link: str):
if ('mailto:' in link):
return True
return False
So
mails = [link.removeprefix('mailto:') for link in links if isMail(link)]