4

Starting from this previous question: Pandas merge two dataframes with different columns

If I concat two dataframes (A & B) that have some of the same columns, but also have columns that are not present in both, in the resulting dataframe the entries for the columns that are not common to both A & B have a value of NaN. Is there a way to make these entries have another default value?

I would rather not simply replace NaN after the concat operation as there may be NaN values in the original dataframes that I want to preserve.

Here are two example dataframes:

hello world how extra 1 2 3 g 5 -666 11 h 13 NaN i 23 7 29 j

extra you how 1.1 31 b -666 37 c 1.3 41 d NaN 43 -666 1.7 -666

If for example the default value for use in the disjoint columns is "W4L" instead of NaN, the desired result would be:

hello world how extra you 1 2 3 g W4L 5 -666 11 h W4L 13 NaN i W4L 23 7 29 j W4L W4L W4L 31 1.1 W4L W4L 37 b -666 W4L W4L 41 c 1.3 W4L W4L 43 d NaN W4L W4L -666 -666 1.7

Community
  • 1
  • 1
dllahr
  • 421
  • 1
  • 4
  • 13

1 Answers1

3

A possible solution is 'conform' the indices before concatenating both dataframes, and in that step it is possible to define a fill_value:

common_columns = df1.columns.union(df2.columns)

df1 = df1.reindex(columns=common_columns, fill_value='W4L')
df2 = df2.reindex(columns=common_columns, fill_value='W4L')

pd.concat([df1, df2])

With your example data:

In [32]: common_columns = df1.columns.union(df2.columns)

In [34]: df1 = df1.reindex(columns=common_columns, fill_value='4WL')

In [35]: df1
Out[35]:
  extra  hello  how world  you
0     g      1    3     2  W4L
1     h      5   11  -666  W4L
2     i     13  NaN        W4L
3     j     23   29     7  W4L

In [36]: df2 = df2.reindex(columns=common_columns, fill_value='W4L')

In [37]: pd.concat([df1, df2])
Out[37]:
  extra hello  how world  you
0     g     1    3     2  W4L
1     h     5   11  -666  W4L
2     i    13  NaN        W4L
3     j    23   29     7  W4L
0         W4L   31   W4L  1.1
1     b   W4L   37   W4L -666
2     c   W4L   41   W4L  1.3
3     d   W4L   43   W4L  NaN
4  -666   W4L -666   W4L  1.7

You can see that the original NaNs are preserved.

joris
  • 133,120
  • 36
  • 247
  • 202