I am trying to scrape a site using this code
#!/usr/bin/python
#coding = utf-8
import urllib, urllib2
req = urllib2.Request(‘http://some website’)
req.add_header('User-agent' : 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36')
f = urllib2.urlopen(req)
body = f.read()
f.close()
This is part of the document returned by the read() method
T\u00f3m l\u01b0\u1ee3c di\u1ec5n ti\u1ebfn Th\u01b0\u1ee3ng H\u1ed9i \u0110\u1ed3ng Gi\u00e1m M\u1ee5c v\u1ec1 Gia \u0110\u00ecnh\
How can I change the above code to get the result like this?
Tóm lược diễn tiến Thượng Hội Đồng Giám Mục về Gia Đình
Thank you.
My issue is solved by using mata's advice. Here the code that works for me. Thank you everyone for helping, especially mata.
#!/usr/bin/python
#coding = utf-8
import urllib, urllib2
req = urllib2.Request(‘http://some website’)
req.add_header('User-agent' : 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36')
f = urllib2.urlopen(req)
body = f.read().decode('unicode-escape').encode('utf-8')
f.close()