0

I want to access the file automatically using Python 3. the website is https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

when you manually enter the url into explorer it asks you to download the file but i want to do this in python automatically and load the data as a df.

i get the below error

URLError:

from urllib.request import urlretrieve
import pandas as pd

# Assign url of file: url
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'

# Save file locally
urlretrieve(url, 'my-sheet.xls')

# Read file into a DataFrame and print its head
df=pd.read_excel('my-sheet.xls')
print(df.head())

URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

Zack
  • 339
  • 2
  • 12

4 Answers4

0

$ curl https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>307 Temporary Redirect</title>
</head><body>
<h1>Temporary Redirect</h1>
<p>The document has moved <a href="https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls">here</a>.</p>
</body></html>

You are just getting redirected. There are ways to implement this in code, but I would just change url to "https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls"

Nikolas
  • 41
  • 3
0

I ran your code in a jupyter environment, and it worked. No error was prompted, but the dataframe has only NaN values. I checked the xls file you are trying to read, and it seems to not contain any data...

There are other ways to retrieve xls data, such as: downloading an excel file from the web in python

import requests
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'

resp = requests.get(url)

output = open('my-sheet.xls', 'wb')
output.write(resp.content)
output.close()

df=pd.read_excel('my-sheet.xls')
print(df.head())

0

You can do it directly with pandas and .read_excel method

df = pd.read_excel("https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls", sheet_name='Data', skiprows=5)

df.head(1)

Output

Ananay Mital
  • 1,395
  • 1
  • 11
  • 16
0

Sorry mate. It works on my PC (not a very helpful comment tbh). Here's a list of things you can do ->

  • Obtain a reference and check the status code of the reference (200 or 300 means that everything is good, anything else has different meanings)
  • Check if that link has bot access blocked (Certain sites do that)
  • In case of blocked access to bot, use selenium for python
Kaustubh J
  • 742
  • 8
  • 9