0

I tried to scrape this page using beautifulsoup find method but I could not find the table value in the HTML page. I found out that the website is generating the data instantly when I load the page through an internal API. Any help??

Thanks in advance.

Ahmed Sabry
  • 115
  • 1
  • 8
  • 1
    I've not looked at the page, but if you've found the Ajax request to the API then you should be able to make your own request to it? – Robin Zigmond Mar 05 '19 at 15:46
  • One of these should do it: https://github.com/dhamaniasad/HeadlessBrowsers –  Mar 05 '19 at 15:49
  • @RobinZigmond thanks for your reply. Do you have any resources on how to establish an Ajax request from python? – Ahmed Sabry Mar 05 '19 at 15:51
  • @AhmedSabry You should try to use Selenium in addition to beautifulsoup, it allows you to recover the data load from an ajax request because you can wait for the load – Maaz Mar 05 '19 at 15:51
  • @AhmedSabry - there are several libraries that allow you to make HTTP requests in Python, [this](https://stackoverflow.com/questions/2018026/what-are-the-differences-between-the-urllib-urllib2-and-requests-module) is a good comparison of them. (Note that you will not be making "an Ajax request", that's just the term used when a webpage does a request in client-side javascript so that the user doesn't have to reload the page, the term doesn't make any sense for a standalone program to collect data from the web.) – Robin Zigmond Mar 05 '19 at 15:54
  • 2
    Possible duplicate of [python - web scraping an ajax website using BeautifulSoup](https://stackoverflow.com/questions/44165387/python-web-scraping-an-ajax-website-using-beautifulsoup) – ChrisGPT was on strike Mar 05 '19 at 16:20
  • I'd avoid headless browsers if at all possible, very slow – SuperStew Mar 05 '19 at 16:21

1 Answers1

1

This works for me. I had to dig around in the dev tools but found it

import requests

geturl=r'https://www.barchart.com/futures/quotes/CLJ19/all-futures'
apiurl=r'https://www.barchart.com/proxies/core-api/v1/quotes/get'


getheaders={

    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'max-age=0',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
    }

getpay={
    'page': 'all'
}

s=requests.Session()
r=s.get(geturl,params=getpay, headers=getheaders)



headers={
    'accept': 'application/json',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'referer': 'https://www.barchart.com/futures/quotes/CLJ19/all-futures?page=all',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
    'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']

}


payload={
    'fields': 'symbol,contractSymbol,lastPrice,priceChange,openPrice,highPrice,lowPrice,previousPrice,volume,openInterest,tradeTime,symbolCode,symbolType,hasOptions',
    'list': 'futures.contractInRoot',
    'root': 'CL',
    'meta': 'field.shortName,field.type,field.description',
    'hasOptions': 'true',
    'raw': '1'

}


r=s.get(apiurl,params=payload,headers=headers)
j=r.json()
print(j)

>{'count': 108, 'total': 108, 'data': [{'symbol': 'CLY00', 'contractSymbol': 'CLY00 (Cash)', ........
SuperStew
  • 2,857
  • 2
  • 15
  • 27