Perform Download via download button in Python

Question

I am somehow new in the region of getting data from a website.

I have, e.g. a website http://www.ariva.de/adidas-aktie/historische_kurse and there is a donwload button hidden as shown in the picture below in red:

The main question is how can I download that in python? I tried some stuff found on the web (e.g. like beautiful soup, scraperwiki etc.) but somehow failed. The data download link is structured as the following:

> Kurse als CSV-Datei       </h3> <div class="clearfloat"></div> </div>
> <form action="/quote/historic/historic.csv" method="get"
> name="histcsv"> <input type="hidden" name="secu" value="291" /> <input
> type="hidden" name="boerse_id" value="6" /> <input type="hidden"
> name="clean_split"  value="1" /> <input type="hidden"
> name="clean_payout" value="1" /> <input type="hidden"
> name="clean_bezug"  value="1" /> <input type="hidden" name="currency" 
> value="EUR" /> <ul style="margin:5px;"> <li> <label
> for="minTime">von:</label> <input id="minTime" name="min_time"
> value="8.2.2016" style="width:71px" /> </li> <li> <label
> for="maxTime">bis:</label> <input id="maxTime" name="max_time"
> value="8.2.2017" style="width:71px" /> </li> <li> <label
> for="trenner">Trennzeichen:</label> <input id="trenner" name="trenner"
> value=";" style="width:25px" /> </li> <li> <input class="submitButton"
> name="go" value="Download" type="submit" /> </li> </ul> </form> </div>
> </div> <div class="clearfloat"></div> </div> </div> </div> <div
> id="foot" class="noprint"> <div class="adControllerAd evtAdShow 
> noprint abstand adHide" id="iqadtile16"> </div> <div id="footer"> <div
> class="footer abstand"> <a
> href="/adidas-aktie/historische_kurse?boerse_id=6&currency=EUR&clean_split=1&clean_payout=1&clean_bezug=1&min_time=2014-09-01&max_time=2017-02-07/wkn_A1EWWW_historic.csv"
> class="anker"> <img src="/forum/i/up.gif" alt="" width="9"
> height="9">Zum Seitenanfang</a> <a
> href="/fehlermeldung/index.m?ag=291&amp;referrer=&amp;ssl=0&amp;url=%2Fadidas-aktie%2Fhistorische_kurse%3Fboerse_id%3D6%26currency%3DEUR%26clean_split%3D1%26clean_payout%3D1%26clean_bezug%3D1%26min_time%3D2014-09-01%26max_time%3D2017-02-07%2Fwkn_A1EWWW_historic.csv"

宏杰李 · Accepted Answer · 2017-02-08T14:42:19.880

import requests

url = 'http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download'
r = requests.get(url)
with open('a.csv', 'wb') as f:
    f.write(r.content)

you can monitor the network use chrome dev tools, and when you click the download, the browser use GET method sending message to server and you can mimic it use requests

how to find the parameters in the url:

you can parse the page and get get the parameter you need, then build the download url and pass it to pandas.

Use pandas read from link:

import pandas as pd
pd.read_csv('http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download')

How to get the parameter:

import requests, bs4

url = 'http://www.ariva.de/adidas-aktie/historische_kurse'
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
payload = {field['name']:field['value'] for field in soup.select('form[name="histcsv"] input')}
csv = requests.post('http://www.ariva.de/quote/historic/historic.csv', data=payload)

thanks a lot. Is it possible to read things automatically out, like secu=291&boerse_id=6 instead doing it manual? Furthermore, can I write the csv file directly into a pandas dataframe? Somehow also the get API in the developer tool from chrome spits me different link out. — MCM, Feb 08 '17 at 12:51
thanks for the answer. I am still struggeling how to construct and easy scraper to read automatically the security ID — MCM, Feb 08 '17 at 14:05
perfect thanks a lot! that is what i was looking for. Much appreciate for the help. — MCM, Feb 08 '17 at 15:51

score 0 · Answer 2 · answered Feb 08 '17 at 12:26

0

Well I would suggest you to use Selenium where you can execute javascript without any extra effort. You can also use Phantom for headless browser.

answered Feb 08 '17 at 12:26

kawadhiya21

2,458
21
34

score 0 · Answer 3 · answered Feb 08 '17 at 12:32

You can get the response for the download via this GET API,

http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download

Here the min_time and max_time are the two date stamp which you will need to provide, and trenner is the separator, you can reciece the response and then write it to a file.

import requests
response = requests.get('http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download')

file = open('download.csv','w+')
file.write(response.text)

thanks a lot. I am using the same comment as above: Is it possible to read things automatically out, like secu=291&boerse_id=6 instead doing it manual? Furthermore, can I write the csv file directly into a pandas dataframe? Somehow also the get API in the developer tool from chrome spits me different link out. — MCM, Feb 08 '17 at 12:52

Perform Download via download button in Python

3 Answers3

Linked