Problems scraping a dynamic website with Beautiful Soup

Question

I would like to scrape crime news articles from the website, but the soup object does not return the required div tag, could anyone give me reasons for that?

import requests
from bs4 import BeautifulSoup 

page = requests.get("https://www.nst.com.my/news/crime-courts?page=1") 
soup = BeautifulSoup(page.text, 'html.parser')
print(soup)

Content is provided dynamically - So take a look for an api or selenium . — HedgeHog, Dec 29 '21 at 18:17
Oh, whenever you spot data appearing after a bit of loading, it is most likely inserted using Javascript. This dynamic data is thus not part of the HTML file that you request. Either look for the API that it is calling, or look up how to parse dynamic webpages. — Chrimle, Dec 29 '21 at 23:22

Arete · Answer 1 · 2022-01-08T13:41:13.557

The answer to this question is too broad to cover here. You need to learn to use Selenium web driver, or any other method that allows you to get the source HTML first, then you can parse it with Beautiful soup.

For example:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
url = "https://www.nst.com.my/news/crime-courts?page=1"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
print(soup.prettify())

See https://stackoverflow.com/a/47730866/2154717 or search for Search for "Scrape dynamic web sites with Selenium and Python".

Thank you all , I got the data I need by using json files – Ashour Ali Jan 07 '22 at 17:08 — Ashour Ali, Jan 07 '22 at 17:08

Problems scraping a dynamic website with Beautiful Soup

1 Answers1