UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2247: character maps to

Question

When I run my code (Python 3) I keep getting this error:

Traceback (most recent call last):
  File "country.py", line 16, in <module>
    for row in csv_reader:
  File "C:\Users\benny\Anaconda3\lib\csv.py", line 112, in __next__
    row = next(self.reader)
  File "C:\Users\benny\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2247: character maps to <undefined>

I have tried these solutions but none work.

The code only prints one line if I fix the encoding problem by adding encoding='UTF-8. If I leave the encoding problem in place it prints almost 700 rows before it throws an error. Either way, it still won't work.

import csv
import country_converter as coco

with open('Interpol.csv', 'r') as csv_file, open('Interpol_Extra.csv', 'w', newline='') as new_file:

    csv_reader = csv.DictReader(csv_file)

    fieldnames = ['Case Happened - UN Region', 'Case Happened - Continent', 
    'Recovered - UN Region', 'Recovered - Continent'] + csv_reader.fieldnames

    csv_writer = csv.DictWriter(new_file, fieldnames)

    csv_writer.writeheader()

    for row in csv_reader:
        case_country_name = row['Case happened - Country']
        recovered_country_name = row['Recovered - Country']

        if case_country_name:
            row['Case Happened - UN Region'] = coco.convert(names=case_country_name, to='UNregion')
            row['Case Happened - Continent'] = coco.convert(names=case_country_name, to='Continent')

        if recovered_country_name:
            row['Recovered - UN Region'] = coco.convert(names=recovered_country_name, to='UNregion')
            row['Recovered - Continent'] = coco.convert(names=recovered_country_name, to='Continent')

    csv_writer.writerow(row)

Try this solution https://stackoverflow.com/a/9233174/6619424 — Arun, Nov 09 '17 at 10:20
@Arun as described in the question, I have tried adding encoding="utf8" which was advised in another answer. It does not work. — TinyTiger, Nov 09 '17 at 10:22
@Arun It only outputs one row if I fix the encoding problem. If I leave the encoding problem in place it will output about 700 rows (but still not enough). — TinyTiger, Nov 09 '17 at 10:29
Try this: ```import codecs; with codecs.open('Interpol.csv', 'r', encoding='utf-8', errors='ignore') as csv_file:``` — Arun, Nov 09 '17 at 10:42
There are 2 ways to pass this kind of problem: the *ignore* suggested by @Arun (which IMHO is just a workaround), or the identification of the offending string. The latter requires that you show the line where the error occurs in a normal format and in hexadecimal to allow readers to make sure of the actual encoding. — Serge Ballesta, Nov 09 '17 at 10:47
@Arun I am trying this `import codecs; with codecs.open('Interpol.csv', 'r', encoding='utf-8', errors='ignore') as csv_file, codecs.open('Interpol_Extra.csv', 'w', newline='') as new_file:` but it's giving me a syntax error. — TinyTiger, Nov 09 '17 at 10:52
@SergeBallesta does the code in my question help identify the offending string? That's all the info I am getting. And how do I fix it? — TinyTiger, Nov 09 '17 at 10:54
@Arun Have also tried putting that codec onto both reader and writer like this `import codecs; codecs.open('Interpol.csv', 'r', encoding='utf-8', errors='ignore') as csv_file, codecs.open('Interpol_Extra.csv', 'w', newline='', encoding='utf-8', errors='ignore') as new_file:` but it has the problem of only outputting one row. — TinyTiger, Nov 09 '17 at 11:04
@Arun it was a problem with my code that I just fixed! Your help identifying the encoding was great though, so thanks. — TinyTiger, Nov 09 '17 at 11:16
@bennygill: Glad to hear that. Please post your answer so it will be helpful for others. — Arun, Nov 09 '17 at 11:18
Please read [Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers?](//meta.stackoverflow.com/q/326569) - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. — halfer, Nov 09 '17 at 12:00

score 1 · Accepted Answer · answered Nov 13 '17 at 01:41

This is the code I used which finally worked.

As suggested by Arun in the comments, if you're having a similar problem you should read all the answers on this question. It has the most succinct and helpful info on stack exchange for this problem.

And then re-check your code to make sure it is valid. In my case, it was some wrong indentation that finally fixed it.

import csv
import country_converter as coco

with open('Interpol.csv', 'r', encoding="utf-8") as csv_file, open('Interpol_Extra.csv', 'w', newline='', encoding="utf-8") as new_file:

    csv_reader = csv.DictReader(csv_file)

    fieldnames = ['Case Happened - UN Region', 'Case Happened - Continent', 
    'Recovered - UN Region', 'Recovered - Continent'] + csv_reader.fieldnames

    csv_writer = csv.DictWriter(new_file, fieldnames)

    csv_writer.writeheader()

    for row in csv_reader:
        case_country_name = row['Case happened - Country']
        recovered_country_name = row['Recovered - Country']

        if case_country_name:
            row['Case Happened - UN Region'] = coco.convert(names=case_country_name, to='UNregion')
            row['Case Happened - Continent'] = coco.convert(names=case_country_name, to='Continent')

        if recovered_country_name:
            row['Recovered - UN Region'] = coco.convert(names=recovered_country_name, to='UNregion')
            row['Recovered - Continent'] = coco.convert(names=recovered_country_name, to='Continent')

        csv_writer.writerow(row)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2247: character maps to

1 Answers1