0

I'm trying to write a list of dictionary entries into a .csv file using csv.writer.writerow. I've scraped this data from a website. A sample of the list of dictionaries is shown below with two relevant entries highlighting my problem...

dictlist = [{'MBFC Rating': 'Left-Center',  
'MBFC URL': 'https://mediabiasfactcheck.com/beijing-review/',  
'News Outlet': 'Beijing Review',  
'News Outlet URL': 'http://www.bjreview.com/',  
'Notes': 'Notes:\xa0Founded in March 1958\xa0as the weekly Peking Review, it was an important tool for the People’s Republic of China\xa0government to communicate to the rest of world.\xa0Has a Communist, Maoist perspective, but reports new factually.'}, 
{'MBFC Rating': 'Right',
  'MBFC URL': 'https://mediabiasfactcheck.com/daily-sabah/',
  'News Outlet': 'Daily Sabah',
  'News Outlet URL': 'http://www.dailysabah.com/',
  'Notes': 'Notes:\xa0Daily Sabah\xa0is an English, German and Arabic-language daily newpaper published in Turkey\xa0and owned by Turkuvaz Media Group. Foreign Policy\xa0has labeled the Daily Sabah as a mouthpiece of the AKP and especially Recep Tayyip Erdoğan, the current president of Turkey. (Wikipedia) \xa0Reports USA news with a right of center bias.'}]

I tried intially to use the code below to accomplish this, but

 with open(OUTPUTFILE, 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(sourcedictlist[0].keys())
    for i in range(len(sourcedictlist)):
        writer.writerow(sourcedictlist[i].values())

I end up getting the following error, which I've traced back to the "g" symbol in "Ergodan" in the "Notes" value of the second dict.

UnicodeEncodeError: 'charmap' codec can't encode character '\u011f' in position 240: character maps to <undefined>

Then, I tried adding the encoding="UTF-8" line to the line where I open the csv file:

with open(OUTPUTFILE, 'w', newline='', encoding="UTF-8") as f:
    writer = csv.writer(f)
    writer.writerow(sourcedictlist[0].keys())
    for i in range(len(sourcedictlist)):
        writer.writerow(sourcedictlist[i].values(

This works to get rid of the error, and it completes the writer.writerow loop to the csv file for all entries, but the text is all messed up when I open my csv! (See below)

Notes: Founded in March 1958 as the weekly Peking Review, it was an important tool for the People’s Republic of China government to communicate to the rest of world. Has a Communist, Maoist perspective, but reports new factually.

Notes: Daily Sabah is an English, German and Arabic-language daily newpaper published in Turkey and owned by Turkuvaz Media Group. Foreign Policy has labeled the Daily Sabah as a mouthpiece of the AKP and especially Recep Tayyip Erdoğan, the current president of Turkey. (Wikipedia)  Reports USA news with a right of center bias.

Can someone please explain what I'm doing wrong here?

user82081
  • 1
  • 1
  • Check this answers http://stackoverflow.com/questions/32382686/unicodeencodeerror-charmap-codec-cant-encode-character-u2010-character-m http://stackoverflow.com/questions/14284269/why-doesnt-python-recognize-my-utf-8-encoded-source-file/14284404#14284404 – Siva Shanmugam Dec 08 '16 at 04:32
  • Thank you, I'd found those examples before, but they haven't been helpful in solving my problem. I've tried running chcp 65001 in the windows console, the python console, the Ipython console, but each of those return some error. I've tried using encoding='ISO-8859-9', which was suggested [here](https://stackoverflow.com/questions/37510650/how-can-i-replace-unicode-characters-with-turkish-characters-in-a-text-file-with). I'm using Spyder 2.3.9 with Python 3.5 as part of the Anaconda package. – user82081 Dec 09 '16 at 01:20

0 Answers0