Python BeautifulSoup findAll not returning all the elements?

Question

I am trying to pull data from this url https://99airdrops.com/page/1/.

The code I have written is below.

import requests
from bs4 import BeautifulSoup

url_str = 'https://99airdrops.com/page/1/'

page = requests.get(url_str, headers={'User-Agent': 'Mozilla Firefox'})

# soup = BeautifulSoup(page.text, 'lxml')
soup = BeautifulSoup(page.text, 'html.parser')

# print(soup.prettify())

print(len(soup.findAll('div')))

print(soup.find('div', class_='title'))

My issue is the line print(len(soup.findAll('div'))) is only returning 23, and the line print(soup.find('div', class_='title')) prints None. The find command isn't finding the div element with class_='title' even though there are multiple instances, and the div element is nested deeply in the html page but this has never caused me issues before.

I've tried using the lxml and html.parser, but neither is returning all the div elements. I also tried writing the html to a file, reading it in, and then running BeautifulSoup with it but I got the same results. Could someone tell me what the issue is here?

I also tried the suggestions here Beautiful Soup - `findAll` not capturing all tags in SVG (`ElementTree` does) to update my lxml package but I run into the same issue still.

I also tried the solutions here BeautifulSoup doesn't find correctly parsed elements with no luck.

What data specifically are you trying to grab? – G_M Sep 21 '18 at 16:37 — G_M, Sep 21 '18 at 16:37
The title, prices, and dates for each entry. – Alex F Sep 21 '18 at 16:39 — Alex F, Sep 21 '18 at 16:39

score 2 · Accepted Answer · answered Sep 21 '18 at 17:13

It seems like you can get all of the data you are looking for with a single request.

>>> import requests
>>> r = requests.get('https://cdn.99airdrops.com/static/airdrops.json')
>>> data = r.json()
>>> len(data)
133

For example:

>>> import json; print(json.dumps(data.popitem(), indent=2))
[
  "pointium",
  {
    "unique": "pointium",
    "name": "Pointium",
    "currency": "PNT",
    "description": "Global Decentralized Platform for Point Management & Loyalty Program",
    "instructions": "<ol><li>Join Telegram <a href=\"https://t.me/pointium\" target=\"_blank\">@Pointium</a> and click \"Join Airdrop\" (+500 PNT) </li><li>Enter your e-mail (+200 PNT) </li><li><a href=\"https://twitter.com/POINTIUM_ICO\" target=\"_blank\">Follow Twitter</a> and submit your username (+500 PNT) </li><li>Confirm your details</li></ol>",
    "rating": "7.30",
    "addDate": "2018-04-20 06:23:03",
    "expirationDate": "2018-05-07",
    "startDate": "2018-04-07",
    "image": "https://cdn.99airdrops.com/static/pointium.jpeg",
    "joinLink": "https://www.pointium.org/airdrop",
    "sponsored": "0",
    "status": "0",
    "startDateFormatted": "7th of April",
    "expirationDateFormatted": "7th of May",
    "attributes": {
      "bitcointalk": "0",
      "category": "airdrop",
      "email": "1",
      "facebook": "0",
      "kyc": "0",
      "news": "https://twitter.com/POINTIUM_ICO",
      "opinion": "O parere personala este ca merge acest sistem foarte bine. Doar ca mai avem de lucrat la el sa fie bomba!",
      "other": "0",
      "phone": "0",
      "ratingConcept": "7",
      "ratingTeam": "5.5",
      "ratingWebsite": "7",
      "ratingWhitepaper": "8",
      "reddit": "0",
      "telegram": "1",
      "tokenGiven": "1200",
      "tokenPrice": "0.007",
      "tokenSupply": "1,600,000,000",
      "tokenType": "ERC20",
      "twitter": "1",
      "website": "www.pointium.org"
    }
  }
]

@AlexF I just opened up chrome dev tools (F12), went to the network tab, reloaded the page and then looked at the requests being sent. — G_M, Sep 21 '18 at 20:12
thank you for the explanation. I followed your steps and was able to find it myself as well. I'll keep this tool in my head for next time. — Alex F, Sep 21 '18 at 20:25

Python BeautifulSoup findAll not returning all the elements?

1 Answers1