Webscraping with urllib

Question

I am looking to get some information off the CME website Namely I want to get the Futures Yield and the Futures DV01 for the 10y Treasury Note Future. Found this little snippet on an old thread:

import urllib.request
class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"
opener = AppURLopener()
fh = opener.open('http://www.cmegroup.com/tools-information/quikstrike/treasury-analytics.html')

It throws a deprecation warning and I am not quite sure how I get the info from the website. Can someone enlighten me please what the new syntax should be and and how to get the info. Thanks

Have you got selenium installed in your pc? If it is , let me know. There are two barriers to cross to reach the data you look for. First off, the webpage is javascript enabled and secondly there is an "iframe" which you need to switch to collect the data. You need to use selenium to gatecrash. — SIM, Aug 29 '17 at 11:24
One thing you might be able to do is purchase the data from the provider. That's likely the best and most straight-forward way of doing things. There are lots of man hours that go into creating those numbers. Feel free to comment if you'd like to know how you can do that. FYI - we will combat the scraping in our next build. Thanks! — Nick Howard, Mar 19 '19 at 15:51
It's worth noting that I haven't found anything against scraping in your Terms of Service or your Rulebook, while you can combat this from the technical perspective, it's usually a moot point because a persistent consumer might hire cheap labor to scrape it (copy/paste) manually, bypassing all bot detections and even captchas. You should speak to your legal department if it's a big issue, and add it clearly in your terms of service. That would allow you to press legal action as well as technical measures. — Madara's Ghost, Mar 19 '19 at 16:14
Hi @NickHoward. Would indeed be cool if I can get the DV01 for TY some other way. Would need daily data via some API ideally. Since it is not rocket science I can of course calculate it myself too. — steff, Mar 21 '19 at 03:52
Also I am not sure what the difference is between me making use of the data that is shown on the CME website or my computer copying it in order to make use of it on my behalf. Same thing really. Am a paying customer of the CME. — steff, Mar 21 '19 at 04:39

score 2 · Accepted Answer · answered Aug 29 '17 at 12:19

2

Run the script when you are done installing selenium.

from selenium import webdriver ; from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("http://www.cmegroup.com/tools-information/quikstrike/treasury-analytics.html")

driver.switch_to_frame(driver.find_element_by_tag_name("iframe"))
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()

table = soup.select('table.grid')[0]
list_of_rows = [[t_data.text for t_data in item.select('th,td')]
                for item in table.select('tr')]

for data in list_of_rows:
    print(data)

I think, this is the table [partial picture] you are after:

answered Aug 29 '17 at 12:19

SIM

21,997
5
37
109

that works beautifully. am using Safari. Thanks so much. – steff Aug 29 '17 at 12:42
How to select a different page (via the 'Contracts' button, to retrieve data from there? I tried this, and a lot more, but no luck; elements = driver.find_elements_by_xpath("//ul[@class='nav']") – Yugmorf Sep 10 '18 at 17:08
To access data for a different contract the following picks up the nodes, but i don't know enough Selenium to work out how to select and load them: driver.get("https://cmegroup-tools.quikstrike.net/User/QuikStrikeView.aspx?viewitemid=IntegratedStrikeAsYield&insid=12300579") elements = driver.find_elements_by_xpath("//div[@class='group']") – Yugmorf Sep 11 '18 at 02:54

Webscraping with urllib

1 Answers1

Linked