I have huge csv files and they contain '\xc3\x84' style characters instead of German umlauts, because I scrapped HTML using BeautifulSoup and wrote it in the csv files using Python 2.7.8.
I managed to replace all those characters with the help of this: Python 2.7.1: How to Open, Edit and Close a CSV file
and now my code looks like this:
import csv
new_rows = []
umlaut = {'\\xc3\\x84': 'Ä', '\\xc3\\x96': 'Ö', '\\xc3\\x9c': 'Ü', '\\xc3\\xa4': 'ä', '\\xc3\\xb6': 'ö', '\\xc3\\xbc': 'ü'}
with open('file1.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
new_row = row
for key, value in umlaut.items():
new_row = [ x.replace(key, value) for x in new_row ]
new_rows.append(new_row)
with open('file2.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(new_rows)
When I open the csv I see Köln instead of Köln and other "German umlaut" problems. I can solve this problem manually by opening the CSV file with notepad and then save it as UTF-8, but I want to do it automated with python.
I do not quite get how to use the UnicodeWriter:
https://docs.python.org/2/library/csv.html#examples
The answers and solutions I found here on stackoverflow are all a little bit complicated.
My question are, how would I use for example the UnicodeWriter right in my case? Do you know any super easy function that does something like file2.encode('utf-8')? If such an easy like function doesn' t exist in Python, then why doesn't it exists yet, because encoding errors are very common?