2

I'm trying to download a csv file from an url without the "csv" suffix. The url is: https://www.ishares.com/de/professionelle-anleger/de/produkte/270048/ishares-msci-world-value-factor-ucits-etf/1478358465952.ajax?fileType=csv&fileName=IS3S_holdings&dataType=fund&asOfDate=20180731

Since there is no "csv" suffix I haven't found any solution for this problem. My current code looks like this:

link = "https://www.ishares.com/de/professionelle-anleger/de/produkte/270048/ishares-msci-world-value-factor-ucits-etf/1478358465952.ajax?fileType=csv&fileName=IS3S_holdings&dataType=fund&asOfDate=20180731"
data = pd.read_csv(link)  

Any help is really appreciated. Thanks!

Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
j. DOE
  • 238
  • 1
  • 2
  • 15

2 Answers2

0

Try to look into the file. There's a header line before the data starts. Skiprows can help.

data = pd.read_csv(link, skiprows=2) 
karla
  • 369
  • 2
  • 8
0

The code I tried :

import urllib2

link = "https://www.ishares.com/de/professionelle-anleger/de/produkte/270048/ishares-msci-world-value-factor-ucits-etf/1478358465952.ajax?fileType=csv&fileName=IS3S_holdings&dataType=fund&asOfDate=20180731"
local_file_name = 'test.csv'
u = urllib2.urlopen(link)
f = open(local_file_name, 'wb')
meta = u.info()

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break
    file_size_dl += len(buffer)
    f.write(buffer)

f.close()

Code executes an urlopen call to download the URL. Curiosly, the file it yields is an html:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" prefix="og: http://ogp.me/ns#" lang="de" xml:lang="de">
<head>
<title>iShares by BlackRock - Führender ETF Anbieter weltweit</title>
<link type="image/x-icon" href="//assets.blackrock.com/uk-retail-assets/ishares-

However, opening the same URL with a web browser allows to get the csv data... that csv, you can read it with pandas and

 data = pd.read_csv(filename, skiprows=2, header=1)  
Lorenzo
  • 162
  • 13