How to import multiple CSVs with different encodings into one data frame?

Question

I'm trying to put together two answers on the site to figure out my situation, but no luck so far.

Essentially I have several CSVs with the same columns but different encodings, which means that when I try the approach here, I also have to iterate through my list of encodings, which I generated this way:

encodings_raw = !chardetect data/*.csv
encodings = [x.split('csv: ')[1].split(' with')[0] for x in encodings_raw]

The value of encodings is:

['Windows-1252', 'UTF-8-SIG', 'ISO-8859-1', 'Windows-1252', 'UTF-8-SIG', 'UTF-8-SIG', 'Windows-1252', 'Windows-1252', 'Windows-1252', 'Windows-1252', 'Windows-1252']

I tried a bunch of things but as I typed out question, I figured out the answer so I'll just post it below.

score 0 · Answer 1 · answered May 18 '22 at 04:50

0

You have to do:

df = pd.concat((pd.read_csv(f, encoding=e) for f,e in zip(data_files,encodings)))

answered May 18 '22 at 04:50

Khashir

341
3
20

How to import multiple CSVs with different encodings into one data frame?

1 Answers1