-1

I'm trying to put together two answers on the site to figure out my situation, but no luck so far.

Essentially I have several CSVs with the same columns but different encodings, which means that when I try the approach here, I also have to iterate through my list of encodings, which I generated this way:

encodings_raw = !chardetect data/*.csv
encodings = [x.split('csv: ')[1].split(' with')[0] for x in encodings_raw]

The value of encodings is:

['Windows-1252', 'UTF-8-SIG', 'ISO-8859-1', 'Windows-1252', 'UTF-8-SIG', 'UTF-8-SIG', 'Windows-1252', 'Windows-1252', 'Windows-1252', 'Windows-1252', 'Windows-1252']

I tried a bunch of things but as I typed out question, I figured out the answer so I'll just post it below.

Khashir
  • 341
  • 3
  • 20

1 Answers1

0

You have to do:

df = pd.concat((pd.read_csv(f, encoding=e) for f,e in zip(data_files,encodings)))

Khashir
  • 341
  • 3
  • 20