Different Twitter HTML structure for browsers and python web opener

Question

I'm working on a script which downloads some data from Twitter profiles. I found out that HTML structure is different in web browser than in python "robot" because when I open the page through python urllib2 and BeautifulSoup I get different tag IDs and classes. Is there a way to get the same content as in web browser?

I need it for short urls resolving because in web browser, resolved urls are stored in link title attribute.

score 1 · Accepted Answer · edited May 23 '17 at 12:29

Most websites adapt their response according to the User-Agent header on the request. If none is set, it is obvious that this is not a browser, but some sort of script. You'll probably want to set a User-Agent header that is somewhat similar to a "real" browser.

Lots of methods to do this are described here: Changing user agent on urllib2.urlopen and here: Fetch a Wikipedia article with Python

On an unrelated note, you might want to use Requests, which is a much better API than the standard urllib2.

score 1 · Answer 2 · answered Jan 07 '12 at 19:13

1

Don't screen scrape for twitter profile information. Use the api. Your whole program will be much more robust. It's probably against their TOS to change your user agent and mess with stuff too.

answered Jan 07 '12 at 19:13

Noufal Ibrahim

71,383
13
135
169

This doesn't help me. Look here https://twitter.com/fn_polizei link in every tweet has its expanded url saved in title attribute of tag. When I use API, i get this http://api.twitter.com/1/statuses/user_timeline.xml?include_entities=true&include_rts=true&screen_name=fn_polizei&count=2 As you can see, it's still shortened (it seems their links are shortened twice). So my question should rather be "Is there any way to read title attribute via python script?" – milanseitler Jan 07 '12 at 20:09
Feel free to accept a solution on this question and open a new, more specific one. – Giacomo Lacava Jan 07 '12 at 21:59

Different Twitter HTML structure for browsers and python web opener

2 Answers2