I am trying to get the html-content of several pages with python 2.7.3 and urllib2. For the most pages, it works fine, but some pages like http://www.bbc.co.uk/news/entertainment-arts-22441507#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa return me this content:
This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.
This problem also occurs with pages where javascript is required. I only get the content within the noscript-tag returned.
Here is how I get the content:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
response = urllib2.urlopen(url).read().decode("utf-8")
Are there additional headers needed?