0

When I run my code (Python 3) I keep getting this error:

Traceback (most recent call last):
  File "country.py", line 16, in <module>
    for row in csv_reader:
  File "C:\Users\benny\Anaconda3\lib\csv.py", line 112, in __next__
    row = next(self.reader)
  File "C:\Users\benny\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2247: character maps to <undefined>

I have tried these solutions but none work.

The code only prints one line if I fix the encoding problem by adding encoding='UTF-8. If I leave the encoding problem in place it prints almost 700 rows before it throws an error. Either way, it still won't work.

import csv
import country_converter as coco

with open('Interpol.csv', 'r') as csv_file, open('Interpol_Extra.csv', 'w', newline='') as new_file:

    csv_reader = csv.DictReader(csv_file)

    fieldnames = ['Case Happened - UN Region', 'Case Happened - Continent', 
    'Recovered - UN Region', 'Recovered - Continent'] + csv_reader.fieldnames

    csv_writer = csv.DictWriter(new_file, fieldnames)

    csv_writer.writeheader()

    for row in csv_reader:
        case_country_name = row['Case happened - Country']
        recovered_country_name = row['Recovered - Country']

        if case_country_name:
            row['Case Happened - UN Region'] = coco.convert(names=case_country_name, to='UNregion')
            row['Case Happened - Continent'] = coco.convert(names=case_country_name, to='Continent')

        if recovered_country_name:
            row['Recovered - UN Region'] = coco.convert(names=recovered_country_name, to='UNregion')
            row['Recovered - Continent'] = coco.convert(names=recovered_country_name, to='Continent')

    csv_writer.writerow(row)
halfer
  • 19,824
  • 17
  • 99
  • 186
TinyTiger
  • 1,801
  • 7
  • 47
  • 92
  • Try this solution https://stackoverflow.com/a/9233174/6619424 – Arun Nov 09 '17 at 10:20
  • @Arun as described in the question, I have tried adding encoding="utf8" which was advised in another answer. It does not work. – TinyTiger Nov 09 '17 at 10:22
  • @Arun It only outputs one row if I fix the encoding problem. If I leave the encoding problem in place it will output about 700 rows (but still not enough). – TinyTiger Nov 09 '17 at 10:29
  • Did you identify the encoding of csv file? – Arun Nov 09 '17 at 10:34
  • @Arun yes it was UTF-8 – TinyTiger Nov 09 '17 at 10:36
  • Try this: ```import codecs; with codecs.open('Interpol.csv', 'r', encoding='utf-8', errors='ignore') as csv_file:``` – Arun Nov 09 '17 at 10:42
  • There are 2 ways to pass this kind of problem: the *ignore* suggested by @Arun (which IMHO is just a workaround), or the identification of the offending string. The latter requires that you show the line where the error occurs in a normal format and in hexadecimal to allow readers to make sure of the actual encoding. – Serge Ballesta Nov 09 '17 at 10:47
  • @Arun I am trying this `import codecs; with codecs.open('Interpol.csv', 'r', encoding='utf-8', errors='ignore') as csv_file, codecs.open('Interpol_Extra.csv', 'w', newline='') as new_file:` but it's giving me a syntax error. – TinyTiger Nov 09 '17 at 10:52
  • @SergeBallesta does the code in my question help identify the offending string? That's all the info I am getting. And how do I fix it? – TinyTiger Nov 09 '17 at 10:54
  • @Arun Have also tried putting that codec onto both reader and writer like this `import codecs; codecs.open('Interpol.csv', 'r', encoding='utf-8', errors='ignore') as csv_file, codecs.open('Interpol_Extra.csv', 'w', newline='', encoding='utf-8', errors='ignore') as new_file:` but it has the problem of only outputting one row. – TinyTiger Nov 09 '17 at 11:04
  • @Arun it was a problem with my code that I just fixed! Your help identifying the encoding was great though, so thanks. – TinyTiger Nov 09 '17 at 11:16
  • 1
    @bennygill: Glad to hear that. Please post your answer so it will be helpful for others. – Arun Nov 09 '17 at 11:18
  • Please read [Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers?](//meta.stackoverflow.com/q/326569) - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. – halfer Nov 09 '17 at 12:00

1 Answers1

1

This is the code I used which finally worked.

As suggested by Arun in the comments, if you're having a similar problem you should read all the answers on this question. It has the most succinct and helpful info on stack exchange for this problem.

And then re-check your code to make sure it is valid. In my case, it was some wrong indentation that finally fixed it.

import csv
import country_converter as coco

with open('Interpol.csv', 'r', encoding="utf-8") as csv_file, open('Interpol_Extra.csv', 'w', newline='', encoding="utf-8") as new_file:

    csv_reader = csv.DictReader(csv_file)

    fieldnames = ['Case Happened - UN Region', 'Case Happened - Continent', 
    'Recovered - UN Region', 'Recovered - Continent'] + csv_reader.fieldnames

    csv_writer = csv.DictWriter(new_file, fieldnames)

    csv_writer.writeheader()

    for row in csv_reader:
        case_country_name = row['Case happened - Country']
        recovered_country_name = row['Recovered - Country']

        if case_country_name:
            row['Case Happened - UN Region'] = coco.convert(names=case_country_name, to='UNregion')
            row['Case Happened - Continent'] = coco.convert(names=case_country_name, to='Continent')

        if recovered_country_name:
            row['Recovered - UN Region'] = coco.convert(names=recovered_country_name, to='UNregion')
            row['Recovered - Continent'] = coco.convert(names=recovered_country_name, to='Continent')

        csv_writer.writerow(row)
TinyTiger
  • 1,801
  • 7
  • 47
  • 92