When I crawl the webpage using urllib2, I can't get the page source but a garbled string which I can't understand what it's. And my code as follow:
url = 'http://finance.sina.com.cn/china/20150905/065523161502.shtml'
conn = urllib2.urlopen(url)
content = conn.read()
print content
Can anyone help me find out what's wrong? Thank you so much.
Update: I think you can run the code above to get what I get. and follows is what I get in python:
{G?0????l???%ߐ?C0 ?K?z?%E |?B ??|?F?oeB?'??M6? y???~???;j????H????L?mv:??:]0Z?Wt6+Y+LV? VisV:캆P?Y?, O?m?p[8??m/???Y]????f.|x~Fa]S?op1M?H?imm5??g?????k?K#?|??? ???????p:O ??(? P?FThq1??N4??P???X??lD???F???6??z?0[?}??z??|??+?pR"s?Lq??&g#?v[((J~??w1@-?G?8???'?V+ks0?????%???5)
And this is what I expected (using curl):
<html>
<head>
<link rel="mask-icon" sizes="any" href="http://www.sina.com.cn/favicon.svg" color="red">
<meta charset="gbk"/>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />