I am trying to scrape a list from the following URL: https://www.oncomap.de/centers?selectedOrgan=Darm&selectedCounty=Deutschland
Using Chrome's Developer Tools, I find that my content of interest is inside body > app-root > app-top > div ...
. I tried finding this content using Python's BeautifulSoup4
package. Unfortunately, it is not possible to dive into the structure beyond the app-root
tag. I am using the following code:
import requests from bs4 import BeautifulSoup import pprint headers = { 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET', 'Access-Control-Allow-Headers': 'Content-Type', 'Access-Control-Max-Age': '3600', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0' } url = 'https://www.oncomap.de/centers?selectedOrgan=Darm&selectedCounty=Deutschland' req = requests.get(url, headers) soup = BeautifulSoup(req.content, "html-parser") mat_row = soup.select('body > app-root') pp = pprint.PrettyPrinter() for child in mat_row[0].descendants: pp.pprint(child)
There is not output from this code - no descendant (also tried children
) is printed. I think I am dealing with a ReactJS div here. Would anyone have any hints how to process such content? Specifically, I am keen to scrape the main list on the page into a Python-readable table. THanks for your help!