0

The CSV file contains the names of the countries used. However, after Argentina, it fails to recover the url. And it returns a empty string.

country,country_url
Afghanistan,https://openaq.org/#/locations?parameters=pm25&countries=AF&_k=tomib2
Algeria,https://openaq.org/#/locations?parameters=pm25&countries=DZ&_k=dcc8ra
Andorra,https://openaq.org/#/locations?parameters=pm25&countries=AD&_k=crspt2
Antigua and Barbuda,https://openaq.org/#/locations?parameters=pm25&countries=AG&_k=l5x5he
Argentina,https://openaq.org/#/locations?parameters=pm25&countries=AR&_k=962zxt
Australia,
Austria,
Bahrain,
Bangladesh,

The country.csv looks like this:

Afghanistan,Algeria,Andorra,Antigua and Barbuda,Argentina,Australia,Austria,Bahrain,Bangladesh,Belgium,Bermuda,Bosnia and Herzegovina,Brazil,

The code used is:

driver = webdriver.Chrome(options = options, executable_path = driver_path)
url = 'https://openaq.org/#/locations?parameters=pm25&_k=ggmrvm'
driver.get(url)
time.sleep(2)

# This function opens .csv file that we created at the first stage
# .csv file includes names of countries
with open('1Countries.csv', newline='') as f:
    reader = csv.reader(f)
    list_of_countries = list(reader)
    list_of_countries = list_of_countries[0]
    print(list_of_countries) # printing a list of countries

# Let's create Data Frame of the country & country_url
df = pd.DataFrame(columns=['country', 'country_url'])

# With this function we are generating urls for each country page
for country in list_of_countries[:92]:
    try:
        path = ('//span[contains(text(),' + '\"' + country + '\"' + ')]')
        # "path" is used to filter each country on the website by
        # iterating country names.
        next_button = driver.find_element_by_xpath(path)
        next_button.click()
        # Using "button.click" we are get on the page of next country
        time.sleep(2)
        country_url = (driver.current_url)
        # "country_url" is used to get the url of the current page
        next_button.click()
    except:
        country_url = None

    d = [{'country': country, 'country_url': country_url}]
    df = df.append(d)

I've tried increasing the sleep time, not sure what is leading to this?

Emma Vaze
  • 17
  • 8
  • Hi Emma, it's most likely because on your page the, on the list on the left where you're selecting the country, you can't see the box to click after Argentina. You need to scroll it into view - i'll put something together in a mo - but in the inrim have a look at this: https://stackoverflow.com/questions/41744368/scrolling-to-element-using-webdriver – RichEdwards Jul 30 '20 at 07:52

1 Answers1

1

The challenge you face is that the country list is scrollalble:

filter list

A bit convenient that your code stops working when they're not displayed.

It's a relatively easy solution - You need to scroll it into view. I've made a quick test with your code to confirm it's working. I removed the CSV part, hard coded a country that's further down the list and I've the parts to make it scroll to view:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time

def ScrollIntoView(element):
    actions = ActionChains(driver)
    actions.move_to_element(element).perform()


url = 'https://openaq.org/#/locations?parameters=pm25&_k=ggmrvm'
driver = webdriver.Chrome()
driver.get(url)
driver.implicitly_wait(10)

country = 'Bermuda'

path = ('//span[contains(text(),' + '\"' + country + '\"' + ')]')
next_button = driver.find_element_by_xpath(path)
ScrollIntoView(next_button)  # added this
next_button.click()
time.sleep(2)
country_url = (driver.current_url)
print(country_url)  # added this
next_button.click()

This is the output from the print:

https://openaq.org/#/locations?parameters=pm25&countries=BM&_k=7sp499

You happy to merge that into your solution? (just say if you need more support)

If it helps a reason you didn't notice for yourself is that try was masking a NotInteractableException. Have a look at how to handle errors here

try statements are great and useful - but it's also good to track when the occur so you can fix them later. Borrowing some code from that link, you can try something like this in your catch:

except:
    print("Unexpected error:", sys.exc_info()[0])
RichEdwards
  • 3,423
  • 2
  • 6
  • 22
  • Yes that was the problem I now quite understand it. Also good catch on the try block will include that too. Thank you! – Emma Vaze Jul 30 '20 at 09:39