Pandas: How to reshape multiple dataframes into a common form?

Question

(Reproducible examples and feeble attempts at the end)

I've got two dataframes, df1 and df2:

df1:

    Col_A   Col_B   Col_D
1   NaN     21      NaN
2   10      NaN     33
4   12      23      38

df2:

    Col_C   Col_E
2   22      44
3   NaN     45
5   4       48

And I'd like to find a common form like this:

df_common:

    Col_A   Col_B   Col_C   Col_D   Col_E
1   NaN     NaN     NaN     NaN     NaN
2   NaN     NaN     NaN     NaN     NaN
3   NaN     NaN     NaN     NaN     NaN
4   NaN     NaN     NaN     NaN     NaN
5   NaN     NaN     NaN     NaN     NaN

...where I've got the union of all column names and row indexes, and NaN values only:

Then I'd like to fill in the values of df1 and df2 (still in two separate tables) so that I'll end up with:

df1_desired

    Col_A   Col_B   Col_C   Col_D   Col_E
1   NaN     21      NaN     NaN     NaN
2   10      NaN     NaN     33      NaN
3   NaN     NaN     NaN     NaN     NaN
4   12      23      NaN     NaN     NaN
5   NaN     NaN     NaN     38      NaN

df2_resired:

    Col_A   Col_B   Col_C   Col_D   Col_E
1   NaN     NaN     NaN     NaN     NaN
2   NaN     NaN     22      NaN     44
3   NaN     NaN     NaN     NaN     35
4   NaN     NaN     NaN     NaN     NaN
5   NaN     NaN     4       NaN     48

I've tried various attempts with pd.merge() and df.update() with no success

But I've come to terms with the fact that I don't even know what to properly call this particular challenge. Thank you for any suggestions!

Reproducible examples:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Col_A': {1: np.nan, 2: '10', 4: '12'},
        'Col_B': {1: '21', 2: np.nan, 4: '23'},
        'Col_D': {1: np.nan, 2: '33', 4: '38'}})

df2 = pd.DataFrame({'Col_C': {2: '22', 3: np.nan, 5: '4'},
                    'Col_E': {2: 44, 3: 45, 5: 48}})

df1_desired = pd.DataFrame({'Col_A': {1: np.nan, 2: '10', 3: np.nan, 4: '12', 5: np.nan},
                        'Col_B': {1: '23', 2: np.nan, 3: np.nan, 4: '23', 5: np.nan},
                        'Col_C': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan},
                        'Col_D': {1: np.nan, 2: '22', 3: np.nan, 4: np.nan, 5: '4'},
                        'Col_E': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}})

df2_desired = pd.DataFrame({'Col_A': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan},
                        'Col_B': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan},
                        'Col_C': {1: np.nan, 2: '22', 3: np.nan, 4: np.nan, 5: '4'},
                        'Col_D': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan},
                        'Col_E': {1: np.nan, 2: '44', 3: '35', 4: np.nan, 5: '48'}})

# find the commons
common_cols = sorted(list(set().union(list(df1),list(df2))))
common_rows = sorted(list(set().union(list(df1.index),list(df2.index))))
df_common = pd.DataFrame(np.nan, index=common_rows, columns=common_cols)

# attempt at reshaping df1 with pd.merge
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
df1_reshaped = pd.merge(df_common, df1, how='left', left_index=True, right_index=True)

# attempt at dropping duplicates for df1
#df1_reshaped = df1_reshaped[df1_reshaped.columns.drop(list(df1_reshaped.filter(regex='_x')))]
#df1_reshaped.columns = df_common.columns

# attempt with df.update()
# https://stackoverflow.com/questions/9787853/join-or-merge-with-overwrite-in-pandas
df1_updated=df_common.update(df1)

score 2 · Accepted Answer · edited May 19 '20 at 14:05

2

You can use:

s=df1.combine_first(df2)
df1=df1.reindex_like(s)
   Col_A  Col_B  Col_C  Col_D  Col_E
1    NaN   21.0    NaN    NaN    NaN
2   10.0    NaN    NaN   33.0    NaN
3    NaN    NaN    NaN    NaN    NaN
4   12.0   23.0    NaN   38.0    NaN
5    NaN    NaN    NaN    NaN    NaN

edited May 19 '20 at 14:05

halfer

19,824
17
99
186

answered May 03 '20 at 01:34

BENY

317,841
20
164
234

Thank you so much! The approach is perfect, although it per the example should have been `s=df_common.combine_first(df1); df1=df1.reindex_like(s)`. (and btw ***not*** my downvote!). – vestland May 03 '20 at 01:57
1

@vestland yw:-) happy coding – BENY May 03 '20 at 01:58
1

Someone probably tested your approach and didn't get that the approach was spot on but the reference not 100% correct as per the question. But I feel that that responsibility lies on *my* shoulders for not having explained the challenge properly. I hope the downvoter sees this and makes things right. – vestland May 03 '20 at 02:02

Pandas: How to reshape multiple dataframes into a common form?

1 Answers1