How do I fill a DataFrame from another DataFrame, adding rows and replacing nulls?

Question

I have two pandas.DataFrames with overlapping columns and indices, like

X = pandas.DataFrame({"A": ["A0", "A1", "A2"], "B": ["B0", None, "B2"]},
                     index=[0, 1, 2])
Y = pandas.DataFrame({"A": [V, "A3"], "B": ["B1", "B3"], "C": ["C1", "C3"]},
                     index=[1, 3])

I would like to extend X by the values in Y, whereever data is missing, keeping the same columns. That is

if V=="A1" or pandas.isnull(V), I'd like to obtain
```
>>> X.fill_from(Y)
    A     B
0  A0    B0
1  A1    B1
2  A2    B2
3  A3    B3
```
The value B1 has been filled from Y because the previous value, None, is a null value in pandas. Row 3 has been added because all values in that row were not given in X, because X had no such row.
If V!="A1", I want to get an exception raised concerning the fact that the data frames contain incompatible data.

If I was sure my data had no missing data, pandas.concat((X, Y), join_axes=[X.columns]) would do the extension, and DataFrame.index.get_duplicates() would tell me if there were mis-matching rows.

The hard part is making sure that missing data is not taken to be different from present data, but can be filled in, and I don't see how to do it without iterating over every possible pair in get_duplicates() and copying data manually.

This question with a similar title is not really related. Using X[X.isnull()] = Y, as in this other question, does not work with the get_duplicates() mis-matching check.

`combine_first` is looking very good for matching data, thank you. Now I need to see how to get errors thrown for non-matching data (because comparing data frames is not obvious either). — Anaphory, Jun 13 '16 at 14:07
As @IanS suggested you can call `combine_first` `X.combine_first(Y)[['A','B']]` to achieve what you want — EdChum, Jun 13 '16 at 14:10
If you `merge` with `indicator=True` then you can see whether rows are left/right/both `X.merge(Y, how='outer', indicator=True)` — EdChum, Jun 13 '16 at 14:12

score 0 · Accepted Answer · answered Jun 13 '16 at 14:23

The combine_first method is half the deal, thanks to @IanS for pointing it out.

>>> X.combine_first(Y)[list(X.columns)]
    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3

Now, if V is nice, we should get the same result when combine_firsting in the other direction, otherwise we will get something different. And because NaNs do not compare nicely, the whole function is

def combine_first_if_matching(X, Y):
    filled = X.combine_first(Y)[list(X.columns)]
    reverse_filled = Y.combine_first(X)[list(X.columns)]
    if ((filled == reverse_filled) | (filled.isnull())).all().all():
        return filled
    else:
        raise ValueError("Overlap of data frames did not match")

How do I fill a DataFrame from another DataFrame, adding rows and replacing nulls?

1 Answers1