I have two pandas.DataFrame
s with overlapping columns and indices, like
X = pandas.DataFrame({"A": ["A0", "A1", "A2"], "B": ["B0", None, "B2"]},
index=[0, 1, 2])
Y = pandas.DataFrame({"A": [V, "A3"], "B": ["B1", "B3"], "C": ["C1", "C3"]},
index=[1, 3])
I would like to extend X
by the values in Y
, whereever data is missing, keeping the same columns. That is
if
V=="A1"
orpandas.isnull(V)
, I'd like to obtain>>> X.fill_from(Y) A B 0 A0 B0 1 A1 B1 2 A2 B2 3 A3 B3
The value
B1
has been filled fromY
because the previous value,None
, is a null value in pandas. Row3
has been added because all values in that row were not given inX
, becauseX
had no such row.If
V!="A1"
, I want to get an exception raised concerning the fact that the data frames contain incompatible data.
If I was sure my data had no missing data, pandas.concat((X, Y), join_axes=[X.columns])
would do the extension, and DataFrame.index.get_duplicates()
would tell me if there were mis-matching rows.
The hard part is making sure that missing data is not taken to be different from present data, but can be filled in, and I don't see how to do it without iterating over every possible pair in get_duplicates()
and copying data manually.
This question with a similar title is not really related. Using X[X.isnull()] = Y
, as in this other question, does not work with the get_duplicates()
mis-matching check.