0

Sorry for bothering you with my request. I have started to get acquaintance with web-scraping with the library BeautifulSoup. Beacuase I have to download some data from OECD's websites I wanted to try some web-scraping approaches. More specifically, I wanted to download a .csv file from the following page: https://goingdigital.oecd.org/en/indicator/50/

As you can see, data can be easily downloaded by clicking on 'Download data'. However, because I will have do deal with some a recursive download with loop, I tried to download it directly from the Python console. Therefore, by inspecting the page, I evidenced the download's URL that I have reported in the following picture: enter image description here

Hence, I wrote the following code:

from bs4 import BeautifulSoup
import requests 
from requests import get

url = 'https://goingdigital.oecd.org/en/indicator/50/'
response = get(url)

print(response.text[:500])

html_soup = BeautifulSoup(response.text, 'html.parser')

type(html_soup)

containers = html_soup.find_all('div', {'class': 'css-cqestz e12cimw51'})
print(type(containers))
print(len(containers))

d = []
for a in containers[0].find_all('a', href = True):
    print(a['href'])
    d.append(a['href'])

The object containers is composed by three elements since there are three divs with the specified class. The first one (the one I have selected in the loop) should be the one containing the URL in which I am interested. However, I get no result. Conversely, when I select the third element of the object containers I get the following output:

https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fgoingdigital.oecd.org%2Fen%2Findicator%2F50%2F
https://twitter.com/intent/tweet?text=OECD%20Going%20Digital%20Toolkit&url=https%3A%2F%2Fgoingdigital.oecd.org%2Fen%2Findicator%2F50%2F
https://www.linkedin.com/shareArticle?mini=true&url=https%3A%2F%2Fgoingdigital.oecd.org%2Fen%2Findicator%2F50%2F
mailto:?subject=OECD%20Going%20Digital%20Toolkit%3A%20Percentage%20of%20individuals%20aged%2055-74%20using%20the%20Internet&body=Percentage%20of%20individuals%20aged%2055-74%20using%20the%20Internet%0A%0Ahttps%3A%2F%2Fgoingdigital.oecd.org%2Fen%2Findicator%2F50%2F

By the way, for this download I guess it could be related to the following thread. Thank you in advance!

1 Answers1

0

When you pull data from a website, you should first check whether the content you are looking for is in the page source. If it's not in the page source, you should try web scraping with selenium.

When I examined the site you mentioned, I could not see it in the page source, it shows that the link you want on this page is dynamically created.

Emre
  • 119
  • 2
  • 8