0

I need to convert multiple CSV files (with different encodings) into UTF-8.

Here is my code:

#find encoding and if not in UTF-8 convert it

import os
import sys
import glob
import chardet
import codecs

myFiles = glob.glob('/mypath/*.csv')

csv_encoding = []

for file in myFiles:
  with open(file, 'rb') as opened_file:
     bytes_file=opened_file.read()
     result=chardet.detect(bytes_file)
     my_encoding=result['encoding']
     csv_encoding.append(my_encoding)
        
print(csv_encoding)

for file in myFiles:
  if csv_encoding in ['utf-8', 'ascii']:
    print(file + ' in utf-8 encoding')
  else:
    with codecs.open(file, 'r') as file_for_conversion:
      read_file_for_conversion = file_for_conversion.read()
    with codecs.open(file, 'w', 'utf-8') as converted_file:
       converted_file.write(read_file_for_conversion)
    print(file +' converted to utf-8')

When I try to run this code I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 5057: invalid continuation byte

Can someone help me? Thanks!!!

aline
  • 1
  • 2
  • Does this answer your question? [How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte"](https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte) – FoggyDay Jun 21 '20 at 16:56
  • 1
    `my_encoding` in your second for-loop always has the last value from the first for-loop, which is unlikely to be correct. – thebjorn Jun 21 '20 at 16:59
  • Well, when you read the file, specify the encoding. – Tarik Jun 21 '20 at 17:30
  • The problem is I have about 20 csv files with different encodings that I need to convert to utf-8 weekly in order to work with them. My idea is to automate this process. – aline Jun 21 '20 at 17:45
  • @aline - Did lenz's response help? If so , please be sure to "upvote" and "accept" it. Otherwise, please update your post with what additional things you've tried, and where you're blocked. – FoggyDay Jun 24 '20 at 16:22

1 Answers1

1

You need to zip the lists myFiles and csv_encoding to get their values aligned:

for file, encoding in zip(myFiles, csv_encoding):
    ...

And you need to specify that value in the open() call:

    ...
    with codecs.open(file, 'r', encoding=encoding) as file_for_conversion:

Note: in Python 3 there's no need to use the codecs module for opening files. Just use the built-in open function and specify the encoding with the encoding parameter.

lenz
  • 5,658
  • 5
  • 24
  • 44