0

I'm working on a script which downloads some data from Twitter profiles. I found out that HTML structure is different in web browser than in python "robot" because when I open the page through python urllib2 and BeautifulSoup I get different tag IDs and classes. Is there a way to get the same content as in web browser?

I need it for short urls resolving because in web browser, resolved urls are stored in link title attribute.

milanseitler
  • 765
  • 1
  • 7
  • 21

2 Answers2

1

Most websites adapt their response according to the User-Agent header on the request. If none is set, it is obvious that this is not a browser, but some sort of script. You'll probably want to set a User-Agent header that is somewhat similar to a "real" browser.

Lots of methods to do this are described here: Changing user agent on urllib2.urlopen and here: Fetch a Wikipedia article with Python

On an unrelated note, you might want to use Requests, which is a much better API than the standard urllib2.

Community
  • 1
  • 1
Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
1

Don't screen scrape for twitter profile information. Use the api. Your whole program will be much more robust. It's probably against their TOS to change your user agent and mess with stuff too.

Noufal Ibrahim
  • 71,383
  • 13
  • 135
  • 169