scrape dynamic data in a html table using python3.6

Question

I am new to scrap from websites. I want to locate and download "price, delta, vol" data from this webpage . These data is in a table and scattered under tags of tenors. I tried following python code:

url = 'https://widget.sentryd.com/widget/#/7D5623FD-E378-411E-A354-C93D10583540' ## 网页版地址
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get(url,headers = headers)
soup = BeautifulSoup(page.text,'lxml')
script = soup.find_all('table')

But I only get a blank list []. How can I locate the data table and extract the data from the website? Thank you very much.

This is the closest solution I can find rather than the data I want.

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://widget.sentryd.com/widget/#/7D5623FD-E378-411E-A354-C93D10583540' 
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get(url,headers = headers)
soup = BeautifulSoup(page.content,'html.parser')
tables = pd.read_html(soup.text)
print(tables[0])

The result is a dataframe with field names but not the data I need. Please advise me how to do. Thanks

Dan-Dev · Accepted Answer · 2018-08-02T13:22:25.840

The web-page is rendered with JavaScript using AJAX. You can access the data using the API the same as the AJAX requests do. You get a JSON response which you can parse to get the data you want. you can edit the URL to just return the currency you want e.g.

url = "https://widget.sentryd.com/widget/api/instruments/latest/EUR_USD?ts=1533035674098"

Here is the code to return all the currencies.

import requests
url = "https://widget.sentryd.com/widget/api/instruments/latest/EUR_USD,GBP_USD,USD_JPY,USD_CAD,USD_CHF,AUD_USD,NZD_USD,EUR_GBP,EUR_JPY,EUR_CAD,EUR_CHF,EUR_AUD,GBP_JPY,GBP_CAD,GBP_CHF,GBP_AUD,USD_CNH,XAU_USD?ts=1533035674098"
r = requests.get(url)
j =r.json()
for i in j:
    print (j[i]['instrument'], j[i]['timestamp'],j[i]['bidPrice'],j[i]['offerPrice'],j[i]['currency'])

Updated in response to comment.

You can get the rest of the raw data like this.

import requests
import time

s = requests.Session()
s.headers.update({'Content-Type': 'application/x-www-form-urlencoded',
                  'Accept': 'application/json',
                  'authorization': 'Basic Y3VycmVuY3lfd2lkZ2V0OmN1cnJlbmN5X3dpZGdldA=='
                  })

url = "https://widget.sentryd.com/widget/sentry/api/Pricing"
data = {'Type': 'Pricing',
        'Products': 'EUR/GBP',
        'POSTAccessCode': 'sentryPricingApi',
        'POSTAccessPassword': 'sentrypricingapi_213',
        'timestamp': str(round(time.time() * 1000))}

r = s.post(url, data=data)
print(r.json())

data['Type']= 'Volatilities'
r = s.post(url, data=data)
print(r.json())

The file https://widget.sentryd.com/widget/currency-widget.min.js processes this data you can read that for examples of how to interpret the data.

If you don't want raw data then scrape it with a technology that renders the JavaScript generated web-page see my answer to Scraping Google Finance (BeautifulSoup)

Thanks.But this is not the data I need. For these exchange rates API address, I can got them from Chrome-->Develop tools-->Networks --> XHR. In this webpage, beside the exchange rates, there is a table updating option prices. I cannot find the API for these option prices like "delta", 'vol', 'skew' ' vega' — ERIC, Aug 01 '18 at 08:02
I tried above code. It responses "Invalid access code". I checked the js page, found the post method, it seems nothing is wrong in the {data} I post. I am still trying your optional ways to scrape the web-page. — ERIC, Aug 02 '18 at 10:39
I added the authorization it needs in I don't think this code changes but you can get it from https://widget.sentryd.com/widget/currency-widget.min.js. It should work now. — Dan-Dev, Aug 02 '18 at 13:24

scrape dynamic data in a html table using python3.6

1 Answers1