0

I have a Excel sheet which I read using pd.read_excel() method. When I am trying to add new columns using my_frame['Test'] = my_frame['My Column'] it throws an error

Tried reading excel file in utf-8 format by using pd.read_excel('encoding'='utf-8') but it did not work. The preprocess_price_file(temp_df) function produces a sliced dataframe and performs some pre-processing which includes dropping some NA rows.

prod_dfs = []
product_price_files = glob.glob('files/product_price/*.xlsx')
for c_file in product_price_files:
    temp_df = pd.read_excel(c_file,encoding = "utf-8")
    temp_df = self.preprocess_price_file(temp_df)
    prod_dfs.append(temp_df)
    prods_df = pd.concat(prod_dfs)
prods_df['Test'] = prods_df['My Column']
return prods_df

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 6: ordinal not in range(128)

Salman Ahsan
  • 63
  • 2
  • 12

1 Answers1

0

I have ran across this issue before and this has helped: How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte".

Tl;dr version is that we cannot assume the incoming encoding, it's always better to be explicit rather than implicit.

If you need to find the native encoding, I try the following (probably not the most pythonic way):

with open(<file path>, "r") as f: print(f)

The output from the io.buffer will contain the file encoding.

merchantofam
  • 37
  • 1
  • 6