Accessing data from the internet

Question

I want to access the file automatically using Python 3. the website is https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

when you manually enter the url into explorer it asks you to download the file but i want to do this in python automatically and load the data as a df.

i get the below error

URLError:

from urllib.request import urlretrieve
import pandas as pd

# Assign url of file: url
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'

# Save file locally
urlretrieve(url, 'my-sheet.xls')

# Read file into a DataFrame and print its head
df=pd.read_excel('my-sheet.xls')
print(df.head())

URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

might be your internet problem.. because working perfectly at my side — HERAwais, Apr 11 '19 at 17:31
Are you behind a proxy? Try [this answer](https://stackoverflow.com/a/16312067/6699913) for possible solutions. — Ayesh Salahuddin, Apr 11 '19 at 18:17

score 0 · Answer 1 · answered Apr 11 '19 at 17:38

$ curl https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>307 Temporary Redirect</title>
</head><body>
<h1>Temporary Redirect</h1>
<p>The document has moved <a href="https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls">here</a>.</p>
</body></html>

You are just getting redirected. There are ways to implement this in code, but I would just change url to "https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls"

score 0 · Answer 2 · answered Apr 11 '19 at 17:38

I ran your code in a jupyter environment, and it worked. No error was prompted, but the dataframe has only NaN values. I checked the xls file you are trying to read, and it seems to not contain any data...

There are other ways to retrieve xls data, such as: downloading an excel file from the web in python

import requests
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'

resp = requests.get(url)

output = open('my-sheet.xls', 'wb')
output.write(resp.content)
output.close()

df=pd.read_excel('my-sheet.xls')
print(df.head())

score 0 · Answer 3 · answered Apr 11 '19 at 17:43

You can do it directly with pandas and .read_excel method

df = pd.read_excel("https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls", sheet_name='Data', skiprows=5)

df.head(1)

Output

score 0 · Answer 4 · answered Apr 11 '19 at 18:09

Sorry mate. It works on my PC (not a very helpful comment tbh). Here's a list of things you can do ->

Obtain a reference and check the status code of the reference (200 or 300 means that everything is good, anything else has different meanings)
Check if that link has bot access blocked (Certain sites do that)
In case of blocked access to bot, use selenium for python

Accessing data from the internet

4 Answers4