BeautifulSoup returns Null results

Question

I am quite new to using beauifulsoup, I am trying to scrape the text from the website using the code below. However, the find_all returns nothing.

import bs4 as bs
import urllib.request
source = urllib.request.urlopen('https://beta.regulations.gov/document/USCIS-2019-0010-9175').read()
soup = BeautifulSoup(page.content,'html.parser')
text = soup.find_all(class_="px-2")
print(text)

html for website

Most likely that content isn't in the original source, but generated by Javascript. You can check the source of the original page to see if it's there. BeautifulSoup can't execute JS, it just looks at the HTML source. — Robin Zigmond, Jan 04 '20 at 23:44
I'm voting to close this. The data is generated dynamically on the website, this exact issue has been covered many times before on Stack Overflow. I don't believe it will bring anything of value. — AMC, Jan 05 '20 at 02:10
Other questions which cover this exact issue: https://stackoverflow.com/q/51117692/11301900, https://stackoverflow.com/q/55351871/11301900, https://stackoverflow.com/q/51984646/11301900, https://stackoverflow.com/q/38334715/11301900, https://stackoverflow.com/q/44867425/11301900, https://stackoverflow.com/q/48313615/11301900, https://stackoverflow.com/q/36060624/11301900, https://stackoverflow.com/q/57226247/11301900, https://stackoverflow.com/q/36854623/11301900, https://stackoverflow.com/q/52102257/11301900, https://stackoverflow.com/q/59205843/11301900. I'm certain that there are many more. — AMC, Jan 05 '20 at 02:21

score 2 · Answer 1 · answered Jan 04 '20 at 23:44

As stated in the comments, the data is loaded dynamically via Javascript. But when you open Firefox/Chrome network tab, you can see from where the data is coming from:

import requests

url = 'https://beta.regulations.gov/document/USCIS-2019-0010-9175'
ajax_url = 'https://beta.regulations.gov/api/documentdetails/{}'

document_id = url.split('/')[-1]
data = requests.get(ajax_url.format(document_id)).json()

# from pprint import pprint # <-- uncoment to see all data
# pprint(data)

print(data['data']['attributes']['content'])

Prints:

Rescind the increase in fees. This is draconian. For all intents and purposes, denying access to this information will prevent many Americans from knowing where they came from. This is an outrage. This is not the mark of a democracy. I strongly disagree with this fee increase

BeautifulSoup returns Null results

1 Answers1