2

I have two dataframes, sharing some columns together.
I'm trying to:

1) Merge the two dataframes together, i.e. adding the columns which are different:

diff = df2[df2.columns.difference(df1.columns)]
merged = pd.merge(df1, diff, how='outer', sort=False, on='ID')

Up to here, everything works as expected.

2) Now, to replace the NaN values with the values of df2

merged = merged[~merged.index.duplicated(keep='first')]
merged.fillna(value=df2)

And it is here that I get:

pandas.core.indexes.base.InvalidIndexError

I don't have any duplicates, and I can't find any information as to what can cause this.

Georgy
  • 12,464
  • 7
  • 65
  • 73
golanor
  • 83
  • 1
  • 9

2 Answers2

3

The solution to this problem is to use a different method - combine_first() this way, each row with missing data is filled with data from the other dataframe, as can be seen here Merging together values within Series or DataFrame columns

golanor
  • 83
  • 1
  • 9
0

In case, number of rows changes because of the merge, fillna sometimes cause error. Try the following!

merged.fillna(df2.groupby(level=0).transform("mean"))

related question

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77