Get value of span tag using BeautifulSoup

Question

I have a number of facebook groups that I would like to get the count of the members of. An example would be this group: https://www.facebook.com/groups/347805588637627/ I have looked at inspect element on the page and it is stored like so:

<span id="count_text">9,413 members</span>

I am trying to get "9,413 members" out of the page. I have tried using BeautifulSoup but cannot work it out.

Thanks

Edit:

from bs4 import BeautifulSoup
import requests

url = "https://www.facebook.com/groups/347805588637627/"
r  = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
span = soup.find("span", id="count_text")
print(span.text)

print the `data` and you will notice there is not such tag. – 宏杰李 Feb 11 '17 at 12:36 — 宏杰李, Feb 11 '17 at 12:36

score 8 · Answer 1 · answered Feb 11 '17 at 11:32

8

In case there is more than one span tag in the page:

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_input, 'html.parser')
span = soup.find("span", id="count_text")
span.text

answered Feb 11 '17 at 11:32

Henrik

423
4
11

Thank you, I tried using this but was given an AttributeError: 'NoneType' object has no attribute 'text' I have updated my question with the code I am using. – newaccount1111 Feb 11 '17 at 11:56
`soup.find()` is not finding any results, so you're calling `.text` on `None`. Try looking at the same webpage in incognito mode in your browser; the element you're looking for is not displayed when not logged in. – Henrik Feb 11 '17 at 12:48
Thanks, that makes complete sense, looks like I am going to have to find another way to get the count of members of a facebook group by the url. Is it possible to be logged in and then use beautiful soup some how? – newaccount1111 Feb 11 '17 at 15:26
Yes, you are not the first person facing this issue. Here's an example of a similar question with several solutions: http://stackoverflow.com/questions/21928368/login-to-facebook-using-python-requests – Henrik Feb 11 '17 at 16:37

score 3 · Answer 2 · answered Feb 11 '17 at 11:24

3

You can use the text attribute of the parsed span:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<span id="count_text">9,413 members</span>',   'html.parser')
>>> soup.span
<span id="count_text">9,413 members</span> 
>>> soup.span.text
'9,413 members'

answered Feb 11 '17 at 11:24

Balthazar Rouberol

6,822
2
35
41

This works perfectly as it is, but when trying it on the actual page I still can't seem to get it to work. I am new to BeautifulSoup, I have updated my question with the code I am using. Thanks – newaccount1111 Feb 11 '17 at 11:57

score 2 · Answer 3 · answered Jun 24 '19 at 17:16

2

If you have more than one span tag you can try this

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

tags = soup('span')

for tag in tags:
  print(tag.contents[0])

answered Jun 24 '19 at 17:16

score 1 · Answer 4 · answered Jun 09 '19 at 15:54

1

Facebook uses javascrypt to prevent bots from scraping. You need to use selenium to extract data on python.

answered Jun 09 '19 at 15:54

Trect

2,759
2
30
35

Get value of span tag using BeautifulSoup

4 Answers4

Linked