1

I want to scrape 'href' tags from a webpage which includes profile URL of the name searched on the website. Sometimes, it might even give a null result if the profile of a particular doesn't exist. I am using python selenium wherein I am inputting names from a csv file and sending the keys to the search bar on the website in a loop. But, sometimes randomly the profile URL of the previous search gets carried to the existing name search. This occurs very randomly and I have checked the logic of the code multiple times and there seems to be no error in that part.

I suspect that the webpage is not loading fully before I am pointing towards a particular element using selenium. I have tried using sleep() but it also works for some values and only sometimes. Increasing sleep time would only increase the time with no guarantee for the accuracy (tried and tested).

I actually want a way to check if the URL of the person exists on the webpage or not and if it does I want the url of that specific person and not of the previous one. Is there a solution to this. This is a small block of code which will add further clarity:

# unique result with name
name = '"' + row[1] + '"'
xpath = "//*[@class='search-result__image-wrapper']/a"
search_query.send_keys(name)
search_query.send_keys(Keys.RETURN)
sleep(5)
#WebDriverWait(driver, timeout).until(EC.presence_of_element_located((By.XPATH, xpath)))
links = driver.find_elements_by_xpath(xpath)
if len(links) == 1:
   for link in links:
      url = link.get_attribute('href')
      print('name')

P.S.: I have also gone through similar questions on stack overflow but none of them seem to work. I have also used the web driver wait method, which checks for the availability of a specific element on the website that occurs on every search but that doesn't seem to work either.

aim_bee.1
  • 77
  • 8

1 Answers1

0

Why not waiting for the first item in the name list until it is visible, and then assigning the name list and iterating over the name list. See the code below:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

name = '"' + row[1] + '"'
css_first_name = ".search-result__image-wrapper > a:nth-child(1)"
css_name_list = ".search-result__image-wrapper > a"
search_query.send_keys(name)
search_query.send_keys(Keys.RETURN)
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, css_first_name)))
links = driver.find_elements_by_css_selector(css_name_list)
for link in links:
  url = link.get_attribute('href')
  print('name')
Mesut GUNES
  • 7,089
  • 2
  • 32
  • 49
  • the code actually more than this. It checks name, name+education, name+career title, name+skills. If I get a unique result for these inputs in order then I store it in URL else N/A. The way you suggested might not work if there are no results at all. Selenium would raise NoElementException and not execute the other parts of the code to query other combinations. What should be the way around in that case? – aim_bee.1 Jun 02 '20 at 12:38
  • If your search result is empty it gives TimeoutException exception, you should handle by try block. If you need more elements inside search it depends on you, this is a basic form of something you want. – Mesut GUNES Jun 02 '20 at 12:42
  • This doesn't seem to work. You can see that I have used the same statement which is commented on in my code. I don't understand the reason though. – aim_bee.1 Jun 02 '20 at 20:24
  • I am checking first item in the list. Then iterating over all items. in your code, if there is one link how you could iterate on it? What error are you getting for this solution? – Mesut GUNES Jun 02 '20 at 20:29
  • GENUS I am iterating because links is a list element but that is not the point. If I use the css_first_name it says InvalidSelectorException and if I use the commented one, I get the same issue of page not loading fully. – aim_bee.1 Jun 02 '20 at 20:38
  • This means that your XPath and because of this the css selector is also not correct. Can you add your HTML to find the correct accessibility? – Mesut GUNES Jun 02 '20 at 20:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/215203/discussion-between-aim-bee-1-and-mesut-gunes). – aim_bee.1 Jun 02 '20 at 20:51
  • @aim_bee.1 did you figure out the element problem? – Mesut GUNES Jun 04 '20 at 05:59
  • No dude, its not working that way either. I found a temporary fix of opening the chromedriver and closing it in every loop which seems to work fine but there is an issue of time complexity – aim_bee.1 Jun 05 '20 at 18:18
  • Dude you should learn more then try it again. This thread is not enough for you. – Mesut GUNES Jun 05 '20 at 18:37