0

I'm trying to remove accents from data in a csv file. So I use the remove_accents function (See below) but for that I need to encode my csv files in utf-8. But I've got the error 'encoding' is an invalid keyword argument for this function
I've seen that I may have to use Python3 and then execute python3 ./myscript.py? Is this the right way to do it ? Or is there another way to remove accents wihtout having to install python3 ? Any help would be much appreciated

 #!/usr/bin/env python

import re
import string
import csv
import unicodedata

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if \
    unicodedata.category(x)[0] == 'L').lower()


reader=csv.reader(open('infile.csv', 'r', encoding='utf-8'), delimiter='\t')
writer=csv.writer(open('outfile.csv', 'w', encoding='utf-8'), delimiter=',')

for line in reader:
    if line[0] != '':
        person=re.split(' ',line[0])

        first_name = person[0].strip().upper()
        first_name1=unicode(first_name)
        first_name2=remove_accents(first_name1)
        if len(person) == 2:
            last_name=person[1].strip().upper()
            line[0]=last_name
        line[15]=first_name2

    writer.writerow(line)
Reveclair
  • 2,399
  • 7
  • 37
  • 59
  • Seems like a duplicate of http://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unico – Gilles Quénot Oct 07 '12 at 15:06

1 Answers1

1

You need to use codecs.open() if you want to be able to specify an encoding. Also, unidecode.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • (thanks for answering)I've now the following error : UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 24: ordinal not in range(128) – Reveclair Oct 07 '12 at 15:05