0

User Chrisvdberge helped me creating the following code :

import pandas as pd
import requests
from bs4 import BeautifulSoup

url_DAX = 'https://www.eurexchange.com/exchange-en/market-data/statistics/market-statistics-online/100!onlineStats?viewType=4&productGroupId=13394&productId=34642&cp=&month=&year=&busDate=20191114'
req = requests.get(url_DAX, verify=False)
html = req.text
soup = BeautifulSoup(html, 'lxml')
df = pd.read_html(str(html))[0]
df.to_csv('results_DAX.csv')
print(df)

url_DOW = 'https://www.cmegroup.com/trading/equity-index/us-index/e-mini-dow_quotes_settlements_futures.html'
req = requests.get(url_DOW, verify=False)
html = req.text
soup = BeautifulSoup(html, 'lxml')
df = pd.read_html(str(html))[0]
df.to_csv('results_DOW.csv')
print(df)

url_NASDAQ = 'https://www.cmegroup.com/trading/equity-index/us-index/e-mini-nasdaq-100_quotes_settlements_futures.html'
req = requests.get(url_NASDAQ, verify=False)
html = req.text
soup = BeautifulSoup(html, 'lxml')
df = pd.read_html(str(html))[0]
df.to_csv('results_NASDAQ.csv')
print(df)

url_CAC = 'https://live.euronext.com/fr/product/index-futures/FCE-DPAR/settlement-prices'
req = requests.get(url_CAC, verify=False)
html = req.text
soup = BeautifulSoup(html, 'lxml')
df = pd.read_html(str(html))[0]
df.to_csv('results_CAC.csv')
print(df)

I have the following result :

  • 3 .csv files are created : results_DAX.csv (here, everything is ok, I have the values I want.) ; results_DOW.csv and results_NASDAQ.csv (here, the problem is that the .csv files don't have the wanted values.. I don't understand why ?)

  • As you can see in the code, 4 files should be created and not only 3.

So my questions are :

  • How to get 4 csv files ?

  • How to get values in the results_DOW.csv and in the results_NASDAQ.csv files ? (and maybe also in the results_CAC.csv file)

Thank you for your answers ! :)

manny-
  • 129
  • 2
  • 10
  • `https://live.euronext.com/fr/product/index-futures/FCE-DPAR/settlement-prices`, there are no `` tags in the html to parse. The other 2, looks like they are dynamic sites, so you'd need to use Selenium, or start a session. Also, no need to use `requests`, `beautifulsoup` and `read_html`. pandas' `read_html` uses beautifulsoup under the hood
    – chitown88 Nov 16 '19 at 16:06
  • it also looks like there is an API/XHR that you can get this data from as well: `https://www.cmegroup.com/CmeWS/mvc/Settlements/Futures/Settlements/318/FUT?tradeDate=11/15/2019&strategy=DEFAULT&pageSize=500&_=1573920407835` for example – chitown88 Nov 16 '19 at 16:08
  • Where did you find the link ?! :o I've spent hours searching for that .. I've tried to use this link instead of the other one, but no .csv file is being created.. I guess it's because there is no table in this link. Ok for the dynamic sites.. so that's the reason why it doesn't work. Could you help me with the use of Selenium to solve my problem ? :) – manny- Nov 16 '19 at 16:15
  • you can find by looking unter the Network -> XHR tab in Dev Tools (Ctrl-alt-I). You might need to reload the page, but then you can see the requests made to get the different data – chitown88 Nov 16 '19 at 16:19
  • Wow I've learnt something with XHR tab ! Thanks. Do you think except the tradeDate part, other part of the link will change everyday ? – manny- Nov 16 '19 at 16:26
  • the root_url won't change. But ya, you may need to change the parameters in the `payload` to fit what you want. So I'm guessing yes, the `'tradeDate'` will probably be different to get the current data for the date – chitown88 Nov 16 '19 at 16:37
  • Yes Sir! You got it :P That's what I'll try to do now ! – manny- Nov 16 '19 at 16:39
  • i've added "import time" + "date = time.strftime("%m/%d/%Y")" to have the format needed. Then I wrote in the payload "'tradeDate': {date}," but it doesn't work. I guess it's not the right way to insert variable in the payload ? Any idea ? :) – manny- Nov 16 '19 at 17:05
  • Not sure. I’ll check it out today – chitown88 Nov 17 '19 at 13:06
  • I added str(time.strftime('%Y%m%d')) in the url, seemed to work. But I want to put some conditions now : if today's date is from monday to friday then .. (if datetime.datetime.now().isoweekday() in range (1, 7):) but i have some issues now.. I'll create another question ! You've already answered the main question here :) – manny- Nov 17 '19 at 13:10
  • https://stackoverflow.com/questions/58901554/syntax-to-use-to-make-minus-x-days-with-python-3-8 – manny- Nov 17 '19 at 14:36

1 Answers1

1

Try this to get those other sites. The last site is a little trickier, so you'd need to try out Selenium:

import pandas as pd
import requests
from bs4 import BeautifulSoup
from datetime import date, timedelta

url_DAX = 'https://www.eurexchange.com/exchange-en/market-data/statistics/market-statistics-online/100!onlineStats?viewType=4&productGroupId=13394&productId=34642&cp=&month=&year=&busDate=20191114'
df = pd.read_html(url_DAX)[0]
df.to_csv('results_DAX.csv')
print(df)



dt = date.today() - timedelta(days=2)
dateParam =  dt.strftime('%m/%d/%Y')


url_DOW = 'https://www.cmegroup.com/CmeWS/mvc/Settlements/Futures/Settlements/318/FUT'
payload = {
'tradeDate': dateParam,
'strategy': 'DEFAULT',
'pageSize': '500',
'_': '1573920502874'}
response = requests.get(url_DOW, params=payload).json()
df = pd.DataFrame(response['settlements'])
df.to_csv('results_DOW.csv')
print(df)


url_NASDAQ = 'https://www.cmegroup.com/CmeWS/mvc/Settlements/Futures/Settlements/146/FUT'
payload = {
'tradeDate': dateParam,
'strategy': 'DEFAULT',
'pageSize': '500',
'_': '1573920650587'}
response = requests.get(url_NASDAQ, params=payload).json()
df = pd.DataFrame(response['settlements'])
df.to_csv('results_NASDAQ.csv')
print(df)
chitown88
  • 27,527
  • 4
  • 30
  • 59