How do I use Bs4 to pull similar information but from different places in DOM hierarchy?

Question

I'm trying to scrape information from a series of pages from like these two:

https://www.nysenate.gov/legislation/bills/2019/s240

https://www.nysenate.gov/legislation/bills/2019/s8450

What I want to do is build a scraper that can pull down the text of "See Assembly Version of this Bill". In the two links listed above, the classes are the same but for one page it's the only iteration of that class, but for another it's the third.

I'm trying to make something like this work:

assembly_version = soup.select_one(".bill-amendment-detail content active > dd")
print(assembly_version)

But I keep getting None

Any thoughts?

The entire site is behind `JS` so you're getting `None` because `BeautifulSoup` doesn't see dynamic content. — baduker, Dec 16 '20 at 15:15
Selenium can see dynamic content if you are able to use it. Check this https://crossbrowsertesting.com/blog/how-to/test-a-dynamic-web-page-selenium/#:~:text=Selenium%20actually%20has%20two%20built,explicit%20and%20the%20implicit%20wait. — Cagri, Dec 16 '20 at 15:17
@Cagri For the first link, by only I mean this class: c-block c-bill-section c-bill--details only appears once in the HTML tree. For the second link, that same class has 4. (I miscounted the first time around.) — Adam, Dec 16 '20 at 15:23
If the issue is only `BeautifulSoup doesn't see dynamic content` as @baduker mentioned, try doing this: https://stackoverflow.com/questions/15866426/beautifulsoup-not-grabbing-dynamic-content — Cagri, Dec 16 '20 at 15:26
I don't think that's the issue - I can grab that content with bs4, but since it's loaded in different spots on different pages, I can't write a scraper that pulls it down without breaking. — Adam, Dec 16 '20 at 15:29

score 0 · Answer 1 · answered Dec 16 '20 at 16:25

url = "https://www.nysenate.gov/legislation/bills/2019/s11"
raw_html = requests.get(url).content
soup = BeautifulSoup(raw_html, "html.parser")

assembly_version = soup.find(class_="c-block c-bill-section c-bill--details").find("a").text.strip()
print(assembly_version)

How do I use Bs4 to pull similar information but from different places in DOM hierarchy?

1 Answers1