I'm scraping simple textfiles from a url.
def scrape_contents_ex(url):
data = urllib2.urlopen(url)
return data.read()
The problem is that the string it yields is choked with newline and tab characters "\t", "\r" etc.
Example:
When I print string output in python, it renders with various \characters:
I don't know how to properly handle the output I read from urlopen. I want to store these contents in postgresql. Moreover, I have another complication where the content very likely yield unicode results (chinese characters, cyrillic, etc).
What is the proper and robust way to read and store this?