0

I would like to scrape crime news articles from the website, but the soup object does not return the required div tag, could anyone give me reasons for that?

import requests
from bs4 import BeautifulSoup 

page = requests.get("https://www.nst.com.my/news/crime-courts?page=1") 
soup = BeautifulSoup(page.text, 'html.parser')
print(soup)
Arete
  • 948
  • 3
  • 21
  • 48
  • Content is provided dynamically - So take a look for an api or selenium . – HedgeHog Dec 29 '21 at 18:17
  • 1
    Oh, whenever you spot data appearing after a bit of loading, it is most likely inserted using Javascript. This dynamic data is thus not part of the HTML file that you request. Either look for the API that it is calling, or look up how to parse dynamic webpages. – Chrimle Dec 29 '21 at 23:22

1 Answers1

0

The answer to this question is too broad to cover here. You need to learn to use Selenium web driver, or any other method that allows you to get the source HTML first, then you can parse it with Beautiful soup.

For example:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
url = "https://www.nst.com.my/news/crime-courts?page=1"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
print(soup.prettify())

See https://stackoverflow.com/a/47730866/2154717 or search for Search for "Scrape dynamic web sites with Selenium and Python".

Arete
  • 948
  • 3
  • 21
  • 48