I have 3000 Excel files. I want to get headers of each file and store it as a csv. However, I am running into a parsing error:
'utf-8' codec can't decode byte 0xfa in position 1: invalid start byte
I have already seen this post. It does not solve the problem: UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>
import glob
import pandas as pd
all_files = glob.glob("Converted Excels/*.xlsx")
file = all_files[0]
#Try 1
columns = []
with open(file, "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
columns.append([row])
break
#Try 2
df = pd.read_csv(file, header=0, nrows=1)
df
Here is an example file. https://docs.google.com/spreadsheets/d/194QD14g_L0NQK6j3yO2Et2ZzycfQDzJXu7vdlr20owA/edit?usp=sharing
I converted this to Excel from a PDF. But during conversion, I had specified encoding="utf8".
How can I get the header from this file?
Thanks a lot for your help.