I am trying to pull data from this url https://99airdrops.com/page/1/.
The code I have written is below.
import requests
from bs4 import BeautifulSoup
url_str = 'https://99airdrops.com/page/1/'
page = requests.get(url_str, headers={'User-Agent': 'Mozilla Firefox'})
# soup = BeautifulSoup(page.text, 'lxml')
soup = BeautifulSoup(page.text, 'html.parser')
# print(soup.prettify())
print(len(soup.findAll('div')))
print(soup.find('div', class_='title'))
My issue is the line print(len(soup.findAll('div')))
is only returning 23, and the line print(soup.find('div', class_='title'))
prints None
. The find command isn't finding the div element with class_='title'
even though there are multiple instances, and the div element is nested deeply in the html page but this has never caused me issues before.
I've tried using the lxml
and html.parser
, but neither is returning all the div elements. I also tried writing the html to a file, reading it in, and then running BeautifulSoup with it but I got the same results. Could someone tell me what the issue is here?
I also tried the suggestions here Beautiful Soup - `findAll` not capturing all tags in SVG (`ElementTree` does) to update my lxml package but I run into the same issue still.
I also tried the solutions here BeautifulSoup doesn't find correctly parsed elements with no luck.