-1

I am getting an AttributeError while scraping a web page with BeautifulSoup. I want to get a specific text from the web page with a specific selector, but I'm getting this error. Can anyone tell me how to properly use the CSS selectors like I have given below?

Error:

AttributeError: 'NoneType' object has no attribute 'text' 

CSS selector

#col-body > div > social-influence > div.row.row-zero.influence-others.panel-inactive > div:nth-child(3) > h4

My code

from bs4 import BeautifulSoup
import requests

html = requests.get("https://www.cryptocompare.com/coins/bnb/influence/USDT").text
soup = BeautifulSoup(html, 'html.parser')
total_commit = soup.select_one("#col-body div social-influence div.row.row-zero.influence-others.panel-inactive div:nth-child(3) h4").text
print(total_commit)

Expected output:

We don't have any code repository data yet.
CrazyChucky
  • 3,263
  • 4
  • 11
  • 25
  • Is there a reason you're using a different CSS selector in your code than the one you show separately? `>` means to find an element *directly* below, while a space means it will keep looking deeper into the tree. – CrazyChucky Oct 26 '21 at 22:45
  • @CrazyChucky they are same selector i just removed ```>``` to make it work is that wrong? I just copy and pasted the selector directly from inspect element i used to work with puppet with node.js there i just copy pasted the selector and it worked but in BS4 i have to remove ```>``` is that wrong?? can you tell me the correct order to use CSS selector in this scenario. – Siddharth Tiwari Oct 27 '21 at 04:01

2 Answers2

2

Your query does not match anything:

total_commit_node  = soup.select_one(
    '#col-body div social-influence div.row.row-zero.influence-others.panel-inactive div:nth-child(3) h4')

if total_commit_node:
    print(total_commit_node.text)
else:
    print('Could not match css selector')
Mr.Manhattan
  • 5,315
  • 3
  • 22
  • 34
  • i have edited sorry for the inconvenience – Siddharth Tiwari Oct 26 '21 at 14:37
  • not working for me – Siddharth Tiwari Oct 26 '21 at 14:48
  • Mr.Manhattan, is the question in its edited version still the question which you have answered? If not, OP should undo the question edit and in that case the copy elsewhere of the current version can be reopened. – Yunnosch Oct 26 '21 at 16:56
  • @Yunnosch The answer still correctly applies to the question. (It doesn't address the second part—how to use CSS selectors—but it covers why the error is occurring, which is the same both before and after the edit.) – CrazyChucky Oct 26 '21 at 22:12
  • @SiddharthTiwari What do you mean "not working"? This answer may not cover everything you asked, but it's correct as far as it goes. And there's no need to ask them to remove it; other people can still add answers if they want to. – CrazyChucky Oct 26 '21 at 22:12
0

As it turns out, there's nothing wrong with your CSS selector (either the original version you show, or the modified version used in your code; both select the same content). The problem is that the portion of the page you want is dynamically generated by JavaScript, so it's not in the HTML. To see this, try:

from bs4 import BeautifulSoup
import requests

html = requests.get(
    'https://www.cryptocompare.com/coins/bnb/influence/USDT'
).text
with open('output.txt', 'w') as f:
    f.write(html)

If you search the resulting file for "We don't have any code repository data yet.", you'll see that it's not there.

So we need to not only access the HTML page, but also run its JavaScript, and save the resulting page's source. You can do this with Selenium, as explained in this answer. You will first have to install Selenium (pip install selenium should probably work), and download the geckodriver executable as appropriate for your OS.

With that done, your code (with Mr.Manhattan's addition) would look like this:

from contextlib import closing

from bs4 import BeautifulSoup
from selenium.webdriver import Firefox

with closing(Firefox()) as browser:
    browser.get('https://www.cryptocompare.com/coins/bnb/influence/USDT')
    html = browser.page_source

soup = BeautifulSoup(html, 'html.parser')
total_commit_node = soup.select_one(
    '#col-body > div > social-influence > '
    'div.row.row-zero.influence-others.panel-inactive > div:nth-child(3) > h4'
)

if total_commit_node:
    print(total_commit_node.text)
else:
    print('Could not match CSS selector')

(Note: I split the selector onto two lines to avoid scrolling. It functions exactly the same, since consecutive string literals are automatically concatenated.)

For me, this correctly outputs:

We don't have any code repository data yet.
CrazyChucky
  • 3,263
  • 4
  • 11
  • 25
  • @SiddharthTiwari Did this resolve your issue? If you feel that it answered your question, please consider accepting it by clicking the checkmark. – CrazyChucky Dec 06 '21 at 22:38