2

I have been trying to extract the name from a twitter profile, the only problem I'm having is that beautifulsoup grabs the entire element. I have tried the {"class":} to specify the element but whenever I do this it results in getting

AttributeError: 'NoneType' object has no attribute 'text' error.

My code:

url = "https://twitter.com/barackobama"
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, 'lxml')

name = soup.find('h1').text
print(name)
Andersson
  • 51,635
  • 17
  • 77
  • 129
Tim0thy
  • 75
  • 4
  • Note that scraping the Twitter website is against the Twitter Terms of Service, and may result in your IP address being blocked. – Andy Piper Nov 16 '18 at 19:39

1 Answers1

3

If you want to get the text from child link of header instead of complete header text, try

url = "https://twitter.com/barackobama"
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, 'lxml')

name = soup.find('h1').a.text
print(name)
# 'Barack Obama'
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • This would not fix the error described in the question, even if OP seems to have thought it worked. `'NoneType' object has no attribute 'text'` implies that `soup.find('h1')` returned `None`, in which case `.a` would **also** not work. – Karl Knechtel Mar 28 '23 at 10:57