Scraping an AJAX web page using python and requests

Question

I tried to scrape this page using beautifulsoup find method but I could not find the table value in the HTML page. I found out that the website is generating the data instantly when I load the page through an internal API. Any help??

Thanks in advance.

I've not looked at the page, but if you've found the Ajax request to the API then you should be able to make your own request to it? — Robin Zigmond, Mar 05 '19 at 15:46
One of these should do it: https://github.com/dhamaniasad/HeadlessBrowsers — , Mar 05 '19 at 15:49
@RobinZigmond thanks for your reply. Do you have any resources on how to establish an Ajax request from python? — Ahmed Sabry, Mar 05 '19 at 15:51
@AhmedSabry You should try to use Selenium in addition to beautifulsoup, it allows you to recover the data load from an ajax request because you can wait for the load — Maaz, Mar 05 '19 at 15:51
@AhmedSabry - there are several libraries that allow you to make HTTP requests in Python, [this](https://stackoverflow.com/questions/2018026/what-are-the-differences-between-the-urllib-urllib2-and-requests-module) is a good comparison of them. (Note that you will not be making "an Ajax request", that's just the term used when a webpage does a request in client-side javascript so that the user doesn't have to reload the page, the term doesn't make any sense for a standalone program to collect data from the web.) — Robin Zigmond, Mar 05 '19 at 15:54
Possible duplicate of [python - web scraping an ajax website using BeautifulSoup](https://stackoverflow.com/questions/44165387/python-web-scraping-an-ajax-website-using-beautifulsoup) — ChrisGPT was on strike, Mar 05 '19 at 16:20

score 1 · Accepted Answer · answered Mar 05 '19 at 16:19

This works for me. I had to dig around in the dev tools but found it

import requests

geturl=r'https://www.barchart.com/futures/quotes/CLJ19/all-futures'
apiurl=r'https://www.barchart.com/proxies/core-api/v1/quotes/get'


getheaders={

    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'max-age=0',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
    }

getpay={
    'page': 'all'
}

s=requests.Session()
r=s.get(geturl,params=getpay, headers=getheaders)



headers={
    'accept': 'application/json',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'referer': 'https://www.barchart.com/futures/quotes/CLJ19/all-futures?page=all',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
    'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']

}


payload={
    'fields': 'symbol,contractSymbol,lastPrice,priceChange,openPrice,highPrice,lowPrice,previousPrice,volume,openInterest,tradeTime,symbolCode,symbolType,hasOptions',
    'list': 'futures.contractInRoot',
    'root': 'CL',
    'meta': 'field.shortName,field.type,field.description',
    'hasOptions': 'true',
    'raw': '1'

}


r=s.get(apiurl,params=payload,headers=headers)
j=r.json()
print(j)

>{'count': 108, 'total': 108, 'data': [{'symbol': 'CLY00', 'contractSymbol': 'CLY00 (Cash)', ........

Scraping an AJAX web page using python and requests

1 Answers1

Linked