0

I am reading headers of csv files from a folder.

code:

#mypath = folder directory with the csv files
for each_file in listdir(mypath):
  with open(mypath +"//"+each_file) as f:
     first_line = f.readline().strip().split(",")

Error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte

Environment:

Spyder, Python 3

Not able to understand the encoding error since I have not done any encoding.

data_person
  • 4,194
  • 7
  • 40
  • 75
  • Check the following link if it may help https://stackoverflow.com/questions/48540170/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python-for-bulgarian-cyr – Aayush Bhatnagar May 30 '18 at 05:38

3 Answers3

0

try using single slash '/'

please try using

with open(mypath +"/"+each_file) as f:

Another problem may be the CSV file contains Unicode, not UTF8. It would be easy if you post sample of CSV file too.

Sudip Ghimire
  • 111
  • 1
  • 5
0

Try using encoding while opening the file in the with condition. I tried the below code and worked fine for me. Please try different encoding's and see if any of it works

for each_file in listdir(path):
    with open(path +"//"+each_file,encoding='utf-8') as f:
        first_line = f.readline().strip().split(",")
        print(each_file ,' --> ',first_line)

Also, check this link for checking file encoding for CSV. hope it helps.

How to check encoding of CSV file

Happy Coding :)

Strik3r
  • 1,052
  • 8
  • 15
0

The built in os.path.join provides a convenient way to join two or more paths, without worrying about Platform specific slashes '/' or '\'.

import os

files = os.listdir(path)

for file in files:
    with open(os.path.join(path, file), encoding='utf-8') as f:
        first_line = str(f.readline()).strip().split(",")
        print(file, ' --> ', first_line)