1

I am trying to extract a CSV file which is stored in a blob URL in this domain using beautiful soup: https://worldpopulationreview.com/country-rankings/exports-by-country

Here's my code:

exports  = pd.read_csv(io.StringIO(requests.get(BeautifulSoup(requests.get('https://worldpopulationreview.com/country-rankings/exports-by-country').text,\
        'html.parser').find_all(download="csvData.csv"))))

What I got was an exception and NO blob link in the href. The blob url does exist when I inspect the html on my browser: and here the exception i received

I decided to just do a get request for the blob url itself instead of scraping it since the href does not show the blob url but this exception appears:

requests.exceptions.InvalidSchema: No connection adapters were found for 'blob:https://worldpopulationreview.com/850ac28e-9cd9-46b6-9423-e96a0bd7e938'

Is there a way to web scrape blob URLs?

Riku
  • 38
  • 6

1 Answers1

2

These blob URLs are created only in the browser, usually with Javascript, they don't exist on the server at all. So you cannot download them with requests.

You could use a Javascript script in the browser console to get the content, here is an example on how to fetch the blob URL in Javascript: https://stackoverflow.com/a/52410044/

If you need to do this automatically, you can possibly create a userscript to do it or use an automation tool like AutoHotkey to click th download link automatically.

cuzi
  • 978
  • 10
  • 21
  • 2
    If anyone is interested, I used Selenium : https://pypi.org/project/selenium/ . If you want to automate chrome then you will need to download the chrome driver : https://chromedriver.chromium.org/getting-started . – Riku Aug 24 '22 at 02:14