I have a very large number of csv files that I need to merge into a single one. I can't make a list and concatenate later because of memory restrictions even tho I have 64 GB of RAM.
To avoid saving everything into memory, I'm streaming the data into a file using:
entidad_csv = folder_entidad / f"{entidad.name.lower()}.csv"
for f in tqdm(files):
df = descomprime(f)
df.to_csv(entidad_csv, index=False, mode="a", header=not entidad_csv.exists())
But I'm having problems when a column is missing in one of the files because it appends the rows as is. I don't know before hand which columns are present in each file so I need the merged file to have every column.
Thanks in advance