I have several dataframes df1, df2, ... with dublicate data, partly overlapping columns and rows ((see below)
How can I lump all dataframes into one dataframe.
df1 = pd.DataFrame({'A': [1,2], 'B': [4,5]}, index=['a', 'b'])
df2 = pd.DataFrame({'B': [5,6], 'C': [8,9]}, index=['b', 'c'])
df3 = pd.DataFrame({'A': [2,3], 'B': [5,6]}, index=['b', 'c'])
df4 = pd.DataFrame({'C': [7,8], index=['a', 'b'])
df5 = pd.DataFrame({'A': [1], 'B': [4], 'C': [7]}, index=['a'])
....
added: example data structure
A B C
a 1 4 7
b 2 5 8
c 3 6 9
added: what I am realy looking for is a more effective way for the following script, which is realy slow for big dataframes
dfs =[df1, df2, df3, df4, df5]
cols, rows = [], []
for df in dfs:
cols = cols + df.columns.tolist()
rows = rows + df.index.tolist()
cols = np.unique(cols)
rows = np.unique(rows)
merged_dfs = pd.DataFrame(data=np.nan, columns=cols, index=rows)
for df in dfs:
for col in df.columns:
for row in df.index:
merged_dfs[col][row] = df[col][row]
fast and easy solution (added 23. Dez. 2015)
dfs =[df1, df2, df3, df4, df5]
# create empty DataFrame with all cols and rows
cols, rows = [], []
for df_i in dfs:
cols = cols + df_i.columns.tolist()
rows = rows + df_i.index.tolist()
cols = np.unique(cols)
rows = np.unique(rows)
df = pd.DataFrame(data=np.NaN, columns=cols, index=rows)
# fill DataFrame
for df_i in dfs:
df.loc[df_i.index, df_i.columns] = df_i.values