-1

I'm trying to scrape information from a series of pages from like these two:

https://www.nysenate.gov/legislation/bills/2019/s240

https://www.nysenate.gov/legislation/bills/2019/s8450

What I want to do is build a scraper that can pull down the text of "See Assembly Version of this Bill". In the two links listed above, the classes are the same but for one page it's the only iteration of that class, but for another it's the third.

I'm trying to make something like this work:

assembly_version = soup.select_one(".bill-amendment-detail content active > dd")
print(assembly_version)

But I keep getting None

Any thoughts?

Adam
  • 315
  • 1
  • 11
  • What do you mean by 'only' and 'third iteration'? – Cagri Dec 16 '20 at 15:13
  • The entire site is behind `JS` so you're getting `None` because `BeautifulSoup` doesn't see dynamic content. – baduker Dec 16 '20 at 15:15
  • Selenium can see dynamic content if you are able to use it. Check this https://crossbrowsertesting.com/blog/how-to/test-a-dynamic-web-page-selenium/#:~:text=Selenium%20actually%20has%20two%20built,explicit%20and%20the%20implicit%20wait. – Cagri Dec 16 '20 at 15:17
  • @Cagri For the first link, by only I mean this class: c-block c-bill-section c-bill--details only appears once in the HTML tree. For the second link, that same class has 4. (I miscounted the first time around.) – Adam Dec 16 '20 at 15:23
  • If the issue is only `BeautifulSoup doesn't see dynamic content` as @baduker mentioned, try doing this: https://stackoverflow.com/questions/15866426/beautifulsoup-not-grabbing-dynamic-content – Cagri Dec 16 '20 at 15:26
  • I don't think that's the issue - I can grab that content with bs4, but since it's loaded in different spots on different pages, I can't write a scraper that pulls it down without breaking. – Adam Dec 16 '20 at 15:29

1 Answers1

0
url = "https://www.nysenate.gov/legislation/bills/2019/s11"
raw_html = requests.get(url).content
soup = BeautifulSoup(raw_html, "html.parser")

assembly_version = soup.find(class_="c-block c-bill-section c-bill--details").find("a").text.strip()
print(assembly_version)
Adam
  • 315
  • 1
  • 11