Save a whole web page instead of basic html with python requests for scraping

Question

So I want to use Beautiful Soup to scrape this page: https://www.nseindia.com/option-chain#optionchain_equity and I access it using requests module. But I guess requests saves only the basic html not the main table in that page. Using chrome to dowload "Webpage, Complete" works but how can I automate it in python? Also without those headers, requests times out so it's necessary I guess. Code:

import requests

url = "https://www.nseindia.com/option-chain#optionchain_equity"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                         'Chrome/80.0.3987.149 Safari/537.36',
           'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'}
response = requests.get(url, headers=headers, timeout=5)
file = open("nse.html", "w")
file.write(response.text)

Also remember to `file.close()` when finished, use `with` to open it might be easier — hedy, Aug 17 '20 at 10:50
Does this answer your question? [Web scraping program cannot find element which I can see in the browser](https://stackoverflow.com/questions/60904786/web-scraping-program-cannot-find-element-which-i-can-see-in-the-browser) — AMC, Aug 21 '20 at 19:02

score 1 · Answer 1 · answered Aug 17 '20 at 10:59

1

If you are mainly looking for the table data, then that table data is loaded via ajax call.

The following script mainly saves the data to a json file.

import requests, json

headers = {'user-agent':"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"}

res = requests.get("https://www.nseindia.com/api/option-chain-indices?symbol=NIFTY", headers=headers)

with open("data.json", "w") as f:
     json.dump(res.json(), f)

answered Aug 17 '20 at 10:59

bigbounty

16,526
5
37
65

This works but is there any way that I can get the javascript rendered html of the page which has the table? Because the old version of the site didn't dynamically render the table so I already have the code ready for scraping the table. – VarunS2002 Aug 18 '20 at 04:37
1

You can use `selenium` to download the html with the table – bigbounty Aug 18 '20 at 04:43

score 1 · Answer 2 · answered Aug 17 '20 at 11:04

if u want to save a whole web page, u may try to find something like a headless chrome API, something like that:

Download file through Google Chrome in headless mode

To interrupt a web page, using a simple python won't help, it just handle as a file reading stream, what you want is a file reading and the web browser behavior, a headless chrome API is the way to go....

Save a whole web page instead of basic html with python requests for scraping

2 Answers2