I am trying to check the status of an account on Twitter.com. The site does not use clear container names as they are dynamically generated I think so I am instead matching on text strings. Inspired by this question, I expect the following code to work but it returns an empty list:
import requests
from bs4 import BeautifulSoup
page = requests.Session().get('https://twitter.com/MikeEPeinovich')
page = page.content
soup = BeautifulSoup(page, "lxml")
print soup.findAll(text="Account suspended")
...and here's a variation using a different request library and HTML parser (same end result though):
import urllib2
from bs4 import BeautifulSoup
page = urllib2.urlopen('https://twitter.com/MikeEPeinovich')
soup = BeautifulSoup(page, "html.parser")
print soup.findAll(text="Account suspended")
Any advice on what I'm doing wrong? Thanks!
UPDATE
It was rightly pointed out to me below that I needed something like Selenium to mimic browser behaviour in order to capture the fully loaded, dynamic webpage object so I've integrated Selenium and Mozilla's Gecko Browser into the script. On inspecting the soup
object though, I'm still clearly not grabbing everything. This is the script I'm using now:
# With Selenium
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from selenium import webdriver
url = "https://twitter.com/MikeEPeinovich"
options = FirefoxOptions()
options.add_argument("--headless")
browser = webdriver.Firefox(options=options)
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
print soup.findAll(text="Account suspended")