Beautifullsoup: How to grab the contents of an tag thats within a
?

Question

I have been trying to extract the name from a twitter profile, the only problem I'm having is that beautifulsoup grabs the entire element. I have tried the {"class":} to specify the element but whenever I do this it results in getting

AttributeError: 'NoneType' object has no attribute 'text' error.

My code:

url = "https://twitter.com/barackobama"
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, 'lxml')

name = soup.find('h1').text
print(name)

Note that scraping the Twitter website is against the Twitter Terms of Service, and may result in your IP address being blocked. — Andy Piper, Nov 16 '18 at 19:39

Andersson · Accepted Answer · 2018-11-16T16:30:14.537

3

If you want to get the text from child link of header instead of complete header text, try

url = "https://twitter.com/barackobama"
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, 'lxml')

name = soup.find('h1').a.text
print(name)
# 'Barack Obama'

edited Nov 16 '18 at 16:30

answered Nov 16 '18 at 16:22

Andersson

51,635
17
77
129

This would not fix the error described in the question, even if OP seems to have thought it worked. `'NoneType' object has no attribute 'text'` implies that `soup.find('h1')` returned `None`, in which case `.a` would **also** not work. – Karl Knechtel Mar 28 '23 at 10:57

Beautifullsoup: How to grab the contents of an tag thats within a ?

?

1 Answers1

Beautifullsoup: How to grab the contents of an tag thats within a
?