2

I'm trying to read monthly csv file but for some reason, I keep getting this error.

This is my code below.

df = pd.DataFrame()
 
for file in os.listdir("Performance_Data"):
    if file.endswith(".csv"):
        df = pd.concat([df , pd.read_csv(os.path.join("Performance_Data", file))], axis=0 )
        
df.head()

What do I do?

Chris
  • 15,819
  • 3
  • 24
  • 37
David
  • 33
  • 4
  • 2
    It may not be a utf-8 encoded file. You can open it in `notepad++` and at the bottom it will show the encoding. Also ensure that it is in fact a comma delimited file and not tab or | If you see a diff encoding just use `encoding='utf-16'` or whatever it is in the read_csv – Chris Dec 16 '21 at 14:39
  • I am the only one who can't read the error? – SNR Dec 16 '21 at 14:46
  • Why don't you accept the answers? – Prophet Jan 08 '22 at 16:26

1 Answers1

0

Pandas assumes by default that your file is encoded in UTF-8. Your file is encoded in Windows-1252. You can tell Pandas to use this encoding by

pd.read_csv(os.path.join("Performance_Data", file), encoding='cp1252')

Detecting the encoding of a file automatically is a bit tricky, but you can use a package called "chardet". For your code, it could look like this:

import os

import chardet
import pandas as pd

df = pd.DataFrame()

for file in os.listdir("Performance_Data"):
    if file.endswith(".csv"):
        with open(file, "rb") as fp:
            encoding = chardet.detect(fp.read())["encoding"]
        df = pd.concat(
            [
                df,
                pd.read_csv(os.path.join("Performance_Data", file), encoding=encoding),
            ],
            axis=0,
        )

df.head()

References

user23952
  • 578
  • 3
  • 10