1

I have this code below, and I am trying to get 'Oswestry, England' as the result.

label = soup.findall('span',{'class':"ProfileHeaderCard-locationText"})
print(label)

But, it doesn't give me a value.

Here is what the HMTL code looks like

<span class="ProfileHeaderCard-locationText u-dir" dir="ltr">
     <a data-place-id="5b756a1991aa8648" href="/search?q=place%3A5b756a1991aa8648">Oswestry, England</a>
     </span>

When I print label the result is the HTML code I posted above. ​ Here is my full code:

import requests as req
from bs4 import BeautifulSoup

usernames = #list of username

location_list = []

for x in usernames:
    url= "https://twitter.com/" + x
    try:
        html = req.get(url)
    except Exception as e:
        print("Failed to")
        continue
    soup = BeautifulSoup(html.text,'html.parser')
    try:
        label = soup.find('span',{'class':"ProfileHeaderCard-locationText"})
        label_formatted = label.string.lstrip()
        label_formatted = label_formatted.rstrip()
        if label_formatted != "":
            location_list.append(label_formatted)
            print(x + ' : ' + label_formatted) 
        else:
            print('Not found')
    except:
        print('Not found')
Mtrinidad
  • 157
  • 1
  • 11

3 Answers3

1

You should call find, not find_all to get a single element. Then use the .text attribute to get the text content.

label = soup.find('span',{'class':"ProfileHeaderCard-locationText"})
print(label.text)
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

It seems that you were searching for a span tag with the class attribute exactly matching your query class. As the span has two classes, your test failed and no results returned.

Using css selectors, you could try your solution as:

from bs4 import BeautifulSoup as BS
soup = BS('''<span class="ProfileHeaderCard-locationText u-dir">.....</span>''', 'html.parser')
soup.select('span.ProfileHeaderCard-locationText')

returns span tags that contain your prescribed class.

see also

Anton Pomieshchenko
  • 2,051
  • 2
  • 10
  • 24
ml_dave
  • 23
  • 5
0

For anyone having the same problem, I was able to do get the innerdata from the html code by just doing this:

label2 = soup.findAll('span',{"class":"ProfileHeaderCard-locationText"})[0].get_text()

Mtrinidad
  • 157
  • 1
  • 11