2

I'm reading multiple csv files in from a folder. While reading multiple files I receive UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 21: invalid start byte

When I try to read file one-by-one I provide encoding of type - "ISO-8859-1" in pandas.read_csv(file_name, encoding). My final objective is to append all files in single data frame. Following is the code I'm using for the mentioned purpose.

import glob


files = glob.glob('/path_name/*.csv')

df = None

for i, f in enumerate (files):
    if i == 0:
        df = pd.read_csv(f)
        df['fname'] = f
    else:
        tmp = read_csv(f)
        tmp['fname'] = f
        df = df.append(tmp)

df.head()
Bhaskar Dhariyal
  • 1,343
  • 2
  • 13
  • 31
  • can't you use `"ISO-8859-1"` in all files ? or use `try/except` to catch error and read with different encoding. – furas May 02 '19 at 07:21
  • I can; I tried `df = pd.read_csv(f, encoding="ISO-8859-1")` but then it is not able to read files it generates `NameError: name 'read_csv' is not defined` – Bhaskar Dhariyal May 02 '19 at 07:28
  • you get this error because you don't have function `read_csv()` - you should use `pd.read_csv()` inside `else` – furas May 02 '19 at 07:30
  • I made edit to mentioned code while also importing the pandas – Bhaskar Dhariyal May 02 '19 at 07:32
  • 2
    do you have own function `def read_csv():` ? Show it. Error shows that you don't have this function. I think you made small mistake in code and you forgot `pd.` in `else` – furas May 02 '19 at 07:35

1 Answers1

0

Try adding errors='ignore', then everything works, but you will lose couple of characters.

with open(path, encoding="utf8", errors='ignore') as f:
PirrenCode
  • 444
  • 4
  • 14
  • Could you please share the full code, I don't where I should add it. I will accept if it works – Bhaskar Dhariyal May 02 '19 at 07:26
  • Sure! Please check this link for detailed explanation: https://stackoverflow.com/questions/42339876/error-unicodedecodeerror-utf-8-codec-cant-decode-byte-0xff-in-position-0-in/42340744 – PirrenCode May 02 '19 at 09:08