1

I want to parse addresses from this website (https://www.conad.it/) with Pyhthon after having searched fro a CAP in the search bar and entered the result. For many CAP's there are many addresses of stores that result and I want to scrape all of them, not just the first one (which is what my code is now doing).

Here's my code so far:

driver = webdriver.Chrome('pathtoChrome/chromedriver.exe')
driver.get("https://www.conad.it/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@href='javascript:void(0)']"))).click() # accept the cookies
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='location-input']"))).send_keys("11100")
driver.find_element_by_xpath("//input[@class = 'btn btn-default btn-lg btn-block']").click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'col-md-8')]"))).get_attribute("innerHTML"))

Which has this as final output:

<h3>Conad</h3><p>Frazione Condemine 84, 11010  Sarre</p><div class="extra-services extra-services-buttons extra-services-desktop extra-services-simple"><ul class="carousel-services"></ul></div>

I would want only the output within the <p> in the upper output but for all attributes within the class 'col-md-8, so for this example of CAP also for the second address.

Optimally I want to store it in a data set which I can append over several loops of different CAP's, so something like this (which doesn't work yet..):

driver = webdriver.Chrome('pathtoChrome/chromedriver.exe')
driver.get("https://www.conad.it/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@href='javascript:void(0)']"))).click() # accept the cookies
CAPS = ['11100']
for CAP in CAPS:
   WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='location-input']"))).send_keys(CAP)
   driver.find_element_by_xpath("//input[@class = 'btn btn-default btn-lg btn-block']").click()
   print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'col-md-8')]"))).get_attribute("innerHTML"))

Any help is appreciated!

KunduK
  • 32,888
  • 5
  • 17
  • 41
tiny
  • 129
  • 6

2 Answers2

1

You can use WebDriverWait() and wait for visibility_of_all_elements_located() and following xpath option to get all p tag value in a list.

print([item.text for item in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class,'col-md-8')]//p")))])

Your output would be like a list.

['Frazione Condemine 84, 11010 Sarre', 'Grand Chemin C/c Centreville 3, 11020 Saint-christophe', "Localita' Arensod 27, 11010 Sarre"]
KunduK
  • 32,888
  • 5
  • 17
  • 41
  • Perfect thanks a lot @KunduK this solves the problem without the loop! However I still run into problems if I want to loop over CAP since I will have to make this work for several city codes. Any help on that? It seems to get stuck on the second to last line where i use `.click()`. – tiny Jan 06 '21 at 15:25
  • okay I did thanks! Here's the new question: https://stackoverflow.com/questions/65599537/how-to-parse-data-with-looping-over-search-field-and-appending-output-in-a-data – tiny Jan 06 '21 at 16:29
0

You were pretty close. A few points:

  • By.XPATH, "//a[@href='javascript:void(0)'] is not a stable locator strategy. Instead you can use a more stable and optimized By.XPATH, "//a[@class='cc_btn cc_btn_accept_all'] as follows:

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@class='cc_btn cc_btn_accept_all']"))).click()
    
  • The texts Frazione Condemine 84, 11010 Sarre etc are within the child <p> tag of the <div>, so you need to move a step deeper.

  • Again, instead of By.XPATH, "//div[contains(@class,'col-md-8')]" as the <p> tag is always followed by <h3> containing the text Conad we can construct a more reliable locator.

  • An optimized solution:

    • Using xpath and get_attribute("innerHTML"):

      print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h3[contains(.,'Conad')]//following-sibling::p[1]")))])
      
    • Using xpath and text attribute:

      print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h3[contains(.,'Conad')]//following-sibling::p[1]")))])
      
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352