I have multiple CSV files which I want to concat. The files may or may not have the same columns. If columns are not the same, the data in different columns should be empty.
Example csv file1 has this data.
name, age
abc, 21
dsd, 20
csv file2 has this data
height, pay
5'5, 2000
5'10, 1000
Now I want to merge these two. For Example, they have millions of rows and I can't read them entirely. I want output in this style
name, age, height, pay
abc, 21, None, None
dsd, 20, None, None
, 5'5, 2000
, 5'10, 1000
Note that columns have null values in them if they don't have data. I have this code but it isn't working. Appending column names with each chunk and not adding empty columns.
import pandas as pd
def common_member(a, b):
a_set = set(a)
b_set = set(b)
if (a_set & b_set):
return True
else:
return False
all_files = ["/home/shahid/work-folders/Ai-Studio-input/mixed_csv/ludwig_train.csv", "/home/shahid/work-folders/Ai-Studio-input/mixed_csv/ludwig_train (copy).csv"]
merged_columns = []
for file in all_files:
df = pd.read_csv(open(file), float_precision='round_trip', chunksize=1000)
for chunk in df:
stop = True
if stop ==True:
merged_columns+= list(chunk.columns)
break
stop= False
print(set(merged_columns))
for file in all_files:
df = pd.read_csv(open(file), float_precision='round_trip', chunksize=1000)
for chunk in df:
if list(chunk.columns) == set(merged_columns):
chunk.to_csv("/home/shahid/work-folders/Ai-Studio-input/mixed_csv/merged.csv", mode='a', index=False)