1

I am new to scrap from websites. I want to locate and download "price, delta, vol" data from this webpage . These data is in a table and scattered under tags of tenors. I tried following python code:

url = 'https://widget.sentryd.com/widget/#/7D5623FD-E378-411E-A354-C93D10583540' ## 网页版地址
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get(url,headers = headers)
soup = BeautifulSoup(page.text,'lxml')
script = soup.find_all('table')

But I only get a blank list []. How can I locate the data table and extract the data from the website? Thank you very much.


This is the closest solution I can find rather than the data I want.

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://widget.sentryd.com/widget/#/7D5623FD-E378-411E-A354-C93D10583540' 
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get(url,headers = headers)
soup = BeautifulSoup(page.content,'html.parser')
tables = pd.read_html(soup.text)
print(tables[0])

The result is a dataframe with field names but not the data I need. Please advise me how to do. Thanks

ERIC
  • 460
  • 6
  • 16

1 Answers1

2

The web-page is rendered with JavaScript using AJAX. You can access the data using the API the same as the AJAX requests do. You get a JSON response which you can parse to get the data you want. you can edit the URL to just return the currency you want e.g.

url = "https://widget.sentryd.com/widget/api/instruments/latest/EUR_USD?ts=1533035674098"

Here is the code to return all the currencies.

import requests
url = "https://widget.sentryd.com/widget/api/instruments/latest/EUR_USD,GBP_USD,USD_JPY,USD_CAD,USD_CHF,AUD_USD,NZD_USD,EUR_GBP,EUR_JPY,EUR_CAD,EUR_CHF,EUR_AUD,GBP_JPY,GBP_CAD,GBP_CHF,GBP_AUD,USD_CNH,XAU_USD?ts=1533035674098"
r = requests.get(url)
j =r.json()
for i in j:
    print (j[i]['instrument'], j[i]['timestamp'],j[i]['bidPrice'],j[i]['offerPrice'],j[i]['currency'])

Updated in response to comment.

You can get the rest of the raw data like this.

import requests
import time

s = requests.Session()
s.headers.update({'Content-Type': 'application/x-www-form-urlencoded',
                  'Accept': 'application/json',
                  'authorization': 'Basic Y3VycmVuY3lfd2lkZ2V0OmN1cnJlbmN5X3dpZGdldA=='
                  })

url = "https://widget.sentryd.com/widget/sentry/api/Pricing"
data = {'Type': 'Pricing',
        'Products': 'EUR/GBP',
        'POSTAccessCode': 'sentryPricingApi',
        'POSTAccessPassword': 'sentrypricingapi_213',
        'timestamp': str(round(time.time() * 1000))}

r = s.post(url, data=data)
print(r.json())

data['Type']= 'Volatilities'
r = s.post(url, data=data)
print(r.json())

The file https://widget.sentryd.com/widget/currency-widget.min.js processes this data you can read that for examples of how to interpret the data.

If you don't want raw data then scrape it with a technology that renders the JavaScript generated web-page see my answer to Scraping Google Finance (BeautifulSoup)

Dan-Dev
  • 8,957
  • 3
  • 38
  • 55
  • Thanks.But this is not the data I need. For these exchange rates API address, I can got them from Chrome-->Develop tools-->Networks --> XHR. In this webpage, beside the exchange rates, there is a table updating option prices. I cannot find the API for these option prices like "delta", 'vol', 'skew' ' vega' – ERIC Aug 01 '18 at 08:02
  • Updated in response to comment. – Dan-Dev Aug 01 '18 at 18:04
  • I tried above code. It responses "Invalid access code". I checked the js page, found the post method, it seems nothing is wrong in the {data} I post. I am still trying your optional ways to scrape the web-page. – ERIC Aug 02 '18 at 10:39
  • I added the authorization it needs in I don't think this code changes but you can get it from https://widget.sentryd.com/widget/currency-widget.min.js. It should work now. – Dan-Dev Aug 02 '18 at 13:24