I am trying to do the same as this answer, but with the difference that I'd like to ignore NaN
in some cases. For instance:
#df1
c1 c2 c3
0 a b 1
1 a c 2
2 a nan 1
3 b nan 3
4 c d 1
5 d e 3
#df2
c1 c2 c4
0 a nan 1
1 a c 2
2 a x 1
3 b nan 3
4 z y 2
#merged output based on [c1, c2], dropping instances
#with `NaN` unless both dataframes have `NaN`.
c1 c2 c3 c4
0 a b 1 1 #c1,c2 from df1 because df2 has a nan in c2
1 a c 2 2 #in both
2 a x 1 1 #c1,c2 from df2 because df1 has a nan in c2
3 b nan 3 3 #c1,c2 as found in both
4 c d 1 nan #from df1
5 d e 3 nan #from df1
6 z y nan 2 #from df2
NaN
s may come from either c1
or c2
, but for this example I kept it simpler.
I'm not sure what's the cleanest way to do this. I was thinking to merge based on [c1,c2]
, and then loop by rows with nan, but this will not be so direct. Do you see a better way to do it?
Edit - clarifying conditions
1. No duplicates are found anywhere.
2. No combination is performed between two rows if they both have values. c1
may not be combined with c2
, so order must be respected.
3. For the cases where one of the 2 dfs has a nan
in either c1
or c2
, find the rows in the other dataframe that don't have a full match on both c1
+c2
, and use it. For instance:
(a,c)
has a match in both so it is no longer discussed.(a,b)
is only indf1
. Nob
is found indf2.c2
. The only row indf2
with a known key and anan
is row0
so it is combined with this one. Note that order must be respected this is why(a,b) #df1
cannot be combined with any other row ofdf2
that also contains ab
.(a,x)
is only indf2
. Nox
is found indf1.c2
. The only row indf1
with one of the known keys with anan
is row with index2
.