0

This way I have a working code that waits for the elements on the page:

wait = WebDriverWait(driver, 60)
    try:
        imo_giris = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[@id='P_ENTREE_HOME']")))
        imo_giris.send_keys(imo, "\n")
    except TimeoutException:
        print("None")
        driver.close()
        continue

How can I integrate this WebDriverWait() module into my code that finds the email regex in the source codes of my page? Here is my code that gets the email regex of the website:

    results = []
    for query in my_list:
        results.append(search(query, tld="com", num=3, stop=3, pause=2))

    for result in results:
        url = list(result)
        print(*url,sep='\n')
        for site in url:
            driver = webdriver.Chrome()
            driver.get(site)
            doc = driver.page_source
            emails = re.findall(r'[\w\.-]+@[\w\.-]+', doc)
            for email in emails:
                print(email)

I can find emails from the source codes on the page, but sometimes the website is not active or it takes a lot of time because the source codes are too long. I want to reduce email regex search to 10 seconds, how can I do that?

I solved the problem

I replaced it with a better regex. The regex I'm using now and working fine:

r'\b[A-Za-z0-9._%+-]+@(?:[A-Za-z0-9-]+\.)+[A-Za-z]{2,4}\b'
E.MRZ
  • 71
  • 1
  • 7

1 Answers1

0

You could create a custom expected condition, but it seems a bit of a overkill. Instead you can use a simple for loop with time measurement

...
doc = driver.page_source
emails = []
end_time = time.time() + 10
while time.time() < end_time and not emails:
    emails = re.findall(r'[\w\.-]+@[\w\.-]+', doc)
print(emails)
Guy
  • 46,488
  • 10
  • 44
  • 88
  • Sometimes the rest of the code does not work because the websites are not opened. What can I do for this? – E.MRZ Dec 22 '21 at 11:49
  • I just tried it 10 seconds later it still keeps waiting. :( – E.MRZ Dec 22 '21 at 11:54
  • @aabb *he websites are not opened* sounds like an issue in the website itself, but I'm guessing. Please post a new question with all the details. – Guy Dec 22 '21 at 11:58
  • @aabb It might have moved to the next loop iteration, but this code won't wait more than 10 seconds. – Guy Dec 22 '21 at 11:59
  • Yes, I get an error when there is a problem with the website itself. I want to prevent it. Also, I couldn't express it for the other problem, I think, because the source codes of some websites are too much, it tries to find an email for about 5-10 minutes, unfortunately I could not prevent this with the code you, still the same. – E.MRZ Dec 22 '21 at 14:38
  • Example of website with long source code: https://www.dnb.com/business-directory/company-profiles.tsakos_columbia_shipmanagement_tcm_sa.68c05665cccea60d1416d4377616378a.html – E.MRZ Dec 22 '21 at 14:39