4

I am somehow new in the region of getting data from a website.

I have, e.g. a website http://www.ariva.de/adidas-aktie/historische_kurse and there is a donwload button hidden as shown in the picture below in red:

enter image description here

The main question is how can I download that in python? I tried some stuff found on the web (e.g. like beautiful soup, scraperwiki etc.) but somehow failed. The data download link is structured as the following:

> Kurse als CSV-Datei       </h3> <div class="clearfloat"></div> </div>
> <form action="/quote/historic/historic.csv" method="get"
> name="histcsv"> <input type="hidden" name="secu" value="291" /> <input
> type="hidden" name="boerse_id" value="6" /> <input type="hidden"
> name="clean_split"  value="1" /> <input type="hidden"
> name="clean_payout" value="1" /> <input type="hidden"
> name="clean_bezug"  value="1" /> <input type="hidden" name="currency" 
> value="EUR" /> <ul style="margin:5px;"> <li> <label
> for="minTime">von:</label> <input id="minTime" name="min_time"
> value="8.2.2016" style="width:71px" /> </li> <li> <label
> for="maxTime">bis:</label> <input id="maxTime" name="max_time"
> value="8.2.2017" style="width:71px" /> </li> <li> <label
> for="trenner">Trennzeichen:</label> <input id="trenner" name="trenner"
> value=";" style="width:25px" /> </li> <li> <input class="submitButton"
> name="go" value="Download" type="submit" /> </li> </ul> </form> </div>
> </div> <div class="clearfloat"></div> </div> </div> </div> <div
> id="foot" class="noprint"> <div class="adControllerAd evtAdShow 
> noprint abstand adHide" id="iqadtile16"> </div> <div id="footer"> <div
> class="footer abstand"> <a
> href="/adidas-aktie/historische_kurse?boerse_id=6&currency=EUR&clean_split=1&clean_payout=1&clean_bezug=1&min_time=2014-09-01&max_time=2017-02-07/wkn_A1EWWW_historic.csv"
> class="anker"> <img src="/forum/i/up.gif" alt="" width="9"
> height="9">Zum Seitenanfang</a> <a
> href="/fehlermeldung/index.m?ag=291&amp;referrer=&amp;ssl=0&amp;url=%2Fadidas-aktie%2Fhistorische_kurse%3Fboerse_id%3D6%26currency%3DEUR%26clean_split%3D1%26clean_payout%3D1%26clean_bezug%3D1%26min_time%3D2014-09-01%26max_time%3D2017-02-07%2Fwkn_A1EWWW_historic.csv"
Peter Hall
  • 53,120
  • 14
  • 139
  • 204
MCM
  • 1,479
  • 2
  • 17
  • 22

3 Answers3

2
import requests

url = 'http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download'
r = requests.get(url)
with open('a.csv', 'wb') as f:
    f.write(r.content)

you can monitor the network use chrome dev tools, and when you click the download, the browser use GET method sending message to server and you can mimic it use requests enter image description here

how to find the parameters in the url: enter image description here

you can parse the page and get get the parameter you need, then build the download url and pass it to pandas.

Use pandas read from link:

import pandas as pd
pd.read_csv('http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download')

How to get the parameter:

import requests, bs4

url = 'http://www.ariva.de/adidas-aktie/historische_kurse'
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
payload = {field['name']:field['value'] for field in soup.select('form[name="histcsv"] input')}
csv = requests.post('http://www.ariva.de/quote/historic/historic.csv', data=payload)
宏杰李
  • 11,820
  • 2
  • 28
  • 35
  • thanks a lot. Is it possible to read things automatically out, like secu=291&boerse_id=6 instead doing it manual? Furthermore, can I write the csv file directly into a pandas dataframe? Somehow also the get API in the developer tool from chrome spits me different link out. – MCM Feb 08 '17 at 12:51
  • thanks for the answer. I am still struggeling how to construct and easy scraper to read automatically the security ID – MCM Feb 08 '17 at 14:05
  • perfect thanks a lot! that is what i was looking for. Much appreciate for the help. – MCM Feb 08 '17 at 15:51
0

Well I would suggest you to use Selenium where you can execute javascript without any extra effort. You can also use Phantom for headless browser.

kawadhiya21
  • 2,458
  • 21
  • 34
0

You can get the response for the download via this GET API,

http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download

Here the min_time and max_time are the two date stamp which you will need to provide, and trenner is the separator, you can reciece the response and then write it to a file.

import requests
response = requests.get('http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download')

file = open('download.csv','w+')
file.write(response.text)
harshil9968
  • 3,254
  • 1
  • 16
  • 26
  • thanks a lot. I am using the same comment as above: Is it possible to read things automatically out, like secu=291&boerse_id=6 instead doing it manual? Furthermore, can I write the csv file directly into a pandas dataframe? Somehow also the get API in the developer tool from chrome spits me different link out. – MCM Feb 08 '17 at 12:52