2

I am relatively new to Python so I apologize if this is a 'bush league' question.

I am trying to retrieve the WTI futures prices from this website: https://www.cmegroup.com/trading/energy/crude-oil/west-texas-intermediate-wti-crude-oil-calendar-swap-futures_quotes_globex.html

Which libraries should I be using? How will I need to adjust the output when it is pulled from the website?

Currently operating in Python 3.6.8 with the pandas, numpy, requests, urllib3, BeautifulSoup, and json libraries. I am not exactly sure if these are the correct libraries and if they are which functions I should be using.

Here is a basic version of the code:

wtiFutC = 'https://www.cmegroup.com/trading/energy/crude-oil/west-texas-intermediate-wti-crude-oil-calendar-swap-futures_quotes_globex.html'
http = urllib3.PoolManager()
response2 = http.request('GET', wtiFutC)
print(type(response2.data)) #check the type of the data produced - bytes
print(response2.data) #prints out the data

soup2 = BeautifulSoup(response2.data.decode('utf-8'), features='html.parser')
print(type(soup2)) #check the type of the data produced - 'bs4.BeautifulSoup'
print(soup2) #prints out the BeautifulSoup version of the data

I want a way to see the 'Last' price for the WTI future for the whole curve. Instead I am seeing something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<!--[if (gt IE 9) |!(IE)]><!-->
<html class="cmePineapple no-js" lang="en" xml:lang="en" 
xmlns="http://www.w3.org/1999/xhtml">
<!--<![endif]-->

Any help or direction would be greatly appreciated. Thank you so much! :)

QHarr
  • 83,427
  • 12
  • 54
  • 101
Unknown1984
  • 25
  • 1
  • 3

3 Answers3

0

The data in that webpage is javascript generated, making it hard to extract data with packages like requests. If it's not a big of a deal I suggest you look for a different source of data that uses minimal or no javascript. Then using requests to get the webpage source and extract data from it.

You extract the data using libraries like BeautifulSoup or re(or even pandas in some cases), and feed them to libraries like numpy or pandas if you want to analyze and do calculations on the data.

Otherwise i suggest you take a look at Selenium for javascript support.

Xosrov
  • 719
  • 4
  • 22
0

Use Requests-HTML. It’s a great resource if you’re already familiar with requests.

Kamikaze_goldfish
  • 856
  • 1
  • 10
  • 24
0

Use the endpoint the page does and parse out the column of interest (and date) from the json

import requests

r = requests.get('https://www.cmegroup.com/CmeWS/mvc/Quotes/Future/4707/G?quoteCodes=null&_=1560171518204').json()
last_quotes = [(item['expirationDate'], item['last']) for item in r['quotes']]
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • I realized that I made a mistake with the original link that I posted (my apologies) - the link should have been: https://www.cmegroup.com/trading/energy/crude-oil/light-sweet-crude_quotes_globex.html - how did you get the link in that json style? – Unknown1984 Jun 10 '19 at 15:30
  • I used the network tab in dev tools to find the endpoint the page was using to update content. That returns json. – QHarr Jun 10 '19 at 15:38
  • Sorry to keep bugging you on this. In my Google Chrome browser, I have the dev tools screen up and I selected the network tab. At this point I'm pretty lost regarding how I find the "endpoint" that the page was using to update content. Any additional direction you could provide? – Unknown1984 Jun 10 '19 at 16:57
  • click refresh to load all the network activity then click any url in the network tab and press Ctrl + F - in the search box that appears enter any value from the page that you would expect to be retrieved. You should then get a list of all matches in network traffic. – QHarr Jun 10 '19 at 17:08
  • that was perfect! Thank you again for your direction! Super helpful!!! – Unknown1984 Jun 10 '19 at 17:52