1

I am have an excel sheet containing names in the first column and organization in the 3rd column.
Based on names from this excel sheet the emails should be scraped from this URL:
https://directory.gatech.edu/

I am using selenium.
I wrote the script:

import selenium.webdriver
       
def scrape(name):        
    url = 'https://directory.gatech.edu/'
 
    driver = selenium.webdriver.Chrome(("mypython/bin/chromedriver_linux64/chromedriver"))
    driver.get(url)
             
    driver.find_element_by_xpath('//*[@id="edit-search"]').send_keys(name)
    driver.find_element_by_xpath('//*[@id="edit-submit"]"]').click()
 
    
# --- main ---                                                      
scrape("Tariq")

But in this url there is a question for proving not being a robot before accessing the data.
How should I pass that automatically, to then scrape email?

stack
  • 149
  • 1
  • 1
  • 8
  • Why dont you locate the element, use regular expression to extract the numbers and the fill the result? – Orestis Zekai Feb 26 '20 at 09:45
  • @OrestisZekai I didnt understand in that there are mathematical calculations how can we automate that in the script without that the results wont come – stack Feb 26 '20 at 09:46

2 Answers2

1

What you are encountering as an obstacle is what was created intentionally to prevent precisely what you are trying to do; i.e. to automatically use that web-access to data.

Even if you do find a way of programmatically getting around something which wants to especially prevent programs from doing so (I guess nobody on StackOverflow will help you with that), doing so is clearly against what that web-presence is meant for.

I assume that you asked because you did not realise this and hence consider this an answer to your problem. Even if you did not realise that your problem is about understanding the purpose of the obstacle, it is still the solution for your problem to simply not try.

In short: What you attempt is unwanted by the site-owners.
What you should do is to stop trying.

RubberBee
  • 150
  • 9
  • 2
    No. Your question is OK. It is asking about a programming problem. It does however show clearly that you try something you should not. I think that telling you not to try is an answer to your question. Other answers, proposing how to to do this automatically seem theoretically possible. But maybe the community is not willing to provide it. It might also require some ingenious level of artificial intelligence... i.e. on the level of successfully fooling something which specifically tries to tell AI from humans... – RubberBee Feb 26 '20 at 10:04
1

To solve the captcha test within the website https://directory.gatech.edu/ using Selenium you can use the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get('https://directory.gatech.edu/')
    my_string = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "label[for='edit-captcha-test']"))).get_attribute("innerHTML")
    chars = my_string.split()[:3]
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[id='edit-captcha-test']"))).send_keys(eval(' '.join(str(x) for x in chars)))
    
  • Browser Snapshot:

captcha_test


Update

To set the name as Tariq in the First name field and solve the captcha test you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get('https://directory.gatech.edu/')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#edit-firstname"))).send_keys("Tariq")
    my_string = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "label[for='edit-captcha-test']"))).get_attribute("innerHTML")
    chars = my_string.split()[:3]
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[id='edit-captcha-test']"))).send_keys(eval(' '.join(str(x) for x in chars)))
    
  • Browser Snapshot:

tariq

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • can I use this code after my code which I had specified in the question to scrap email of particular name – stack Feb 26 '20 at 11:27
  • I had pasted your code after my code even getting error on my code which I didnt get before driver.find_element_by_xpath('//*[@id="edit-search"]').send_keys(name) not found – stack Feb 26 '20 at 11:41
  • @stack Simply copy/paste the code and try in your IDE to verify if the captcha question is answer properly or not. – undetected Selenium Feb 26 '20 at 11:49
  • Yes I tried it is fine thankyou but can you say why it is not working with the code specified in question I need the name and captcha also – stack Feb 26 '20 at 11:53
  • @stack Checkout the updated answer and let me know the status. – undetected Selenium Feb 26 '20 at 12:02
  • it worked fine Thank You can you suggest any tutorials or books to learn selenium in deep like you – stack Feb 26 '20 at 12:06
  • Yes I accepted can you suggest any books or tutorials for selenium to learn please @Debanjan – stack Feb 26 '20 at 12:09
  • See, there are a lot of tutorials in the form of website and videos. But unfortunately, majority of them are either outdated or doesn't provide optimum solution. At the best I can share the only website from where I started. Whatever I know I have shared here in stackoverflow. Search the _frequently asked_ question tab. You can find it all. – undetected Selenium Feb 26 '20 at 12:17
  • great why cant you start a room for selenium in stack overflow it will be useful for everyone @Debanjan – stack Feb 26 '20 at 12:19
  • @stack I can do that in any point of time but then we will need more contributors who can dig deeper into the real time issues. I wish I can expect a helping hand from you in the coming days. – undetected Selenium Feb 26 '20 at 12:21
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/208566/discussion-between-stack-and-debanjanb). – stack Feb 26 '20 at 12:25
  • @DebanjanB Are you there? – gig Feb 27 '20 at 04:43
  • @G.Lakshmi Yes, do you have a question? – undetected Selenium Feb 27 '20 at 06:11