BeautifulSoup can only see what's directly baked into the HTML of a resource at the time it is initially requested. The content you're trying to scrape isn't baked into the page, because normally, when you view this particular page in a browser, the DOM is populated asynchronously using JavaScript. Fortunately, logging your browser's network traffic reveals requests to a REST API, which serves the contents of the table as JSON. The following script makes an HTTP GET request to that API, given a desired "dataset_id"
(you can change the key-value pair in the params
dict as desired). The response is then dumped into a CSV file:
def main():
import requests
import csv
url = "https://portal.karandaaz.com.pk/api/table"
params = {
"dataset_id": "1000"
}
response = requests.get(url, params=params)
response.raise_for_status()
content = response.json()
filename = "dataset_{}.csv".format(params["dataset_id"])
with open(filename, "w", newline="") as file:
fieldnames = content["data"]["columns"]
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
for row in content["data"]["rows"]:
writer.writerow(dict(zip(fieldnames, row)))
return 0
if __name__ == "__main__":
import sys
sys.exit(main())