5

I have a python function that uses the requests library and BeautifulSoup to scrape a particular user's tweets.

import requests
from bs4 import BeautifulSoup

contents = requests.get("https://twitter.com/user")
soup = BeautifulSoup(contents.text, "html.parser")

When the requests library accesses Twitter, it uses the legacy version of Twitter. However, since Twitter recently dropped support for its legacy version, the requests library no longer works and returns html code saying that this version of Twitter is out of date.

Is there a way to make the requests library access the newer version of Twitter?

J.t.p
  • 51
  • 1
  • 3

3 Answers3

0

Can't answer directly (and don't have enough points to comment) but having the same issue I did find some new tools. https://github.com/bisguzar/twitter-scraper uses requests_html to fetch tweets (see their tweets.py module). And https://github.com/Mottl/GetOldTweets3/ is another powerful python tool for scraping tweets.

Woolwit
  • 89
  • 8
-1

The requests library will access the URL you pass it. I recommend checking the Twitter API Docs and updating your code to correspond to the up-to-date version.

vauhochzett
  • 2,732
  • 2
  • 17
  • 40
-1

I also encountered this problem. The root cause of this is Twitter rejecting "legacy" browsers, which unfortunately includes Python's requests library.

Twitter figures out what browser you are using by looking at the User-Agent header sent as part of the request. So my solution to the problem was simply to spoof this header.

In your particular case, try doing something like;

import requests
from bs4 import BeautifulSoup

contents = requests.get(
    "https://twitter.com/user",
    headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"}
)
soup = BeautifulSoup(contents.text, "html.parser")