6

Suppose I have 2 dataframes with overlapping column and index names that look as such:

  A B C D
A 0 1 0 1
B 0 1 1 0
C 1 0 1 0
D 0 0 0 1

  A C D E
A 1 0 0 0
B 0 1 0 0
D 0 0 0 0
E 1 0 0 1

I want to combine these two dataframes into one such that cells with the same column and index names are combined. The end result should look like this:

  A B C D E
A 1 1 0 1 0
B 0 1 1 0 0
C 1 0 1 0 0
D 0 0 0 1 0
E 1 0 0 0 1

I've tried using the Pandas.concat method but it only concatenates along one of the axes.

Ethan Li
  • 85
  • 6

2 Answers2

22

align and np.maximum

  • pandas.DataFrame.align will produce a copy of the calling DataFrame and the argument DataFrame with their index and column attributes aligned and return them as a tuple of two DataFrame
  • Pass both to numpy.maximum which will conveniently respect that these are pandas.DataFrame objects and return a new DataFrame with the appropriate maximal values.

np.maximum(*df1.align(df2, fill_value=0))

   A  B  C  D  E
A  1  1  0  1  0
B  0  1  1  0  0
C  1  0  1  0  0
D  0  0  0  1  0
E  1  0  0  0  1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
6

How about:

(df1.add(df2, fill_value=0)
    .fillna(0)
    .gt(0)
    .astype(int))

output:

    A   B   C   D   E
A   1   1   0   1   0
B   0   1   1   0   0
C   1   0   1   0   0
D   0   0   0   1   0
E   1   0   0   0   1
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • It looks like the code worked, thanks! I have a few follow up questions if you don't mind answering. How does the add function behave differently than the concat function in terms of searching the rows/columns when both use an input field? Also, what is the significance of the gt function in this situation? – Ethan Li Jul 12 '19 at 19:43
  • 1
    I don't know the answer to the first question. I guess it is equivalent to do: `pd.concat((df1,df2), sort=False).groupby(level=0).sum()`. For the role of `gt(0)`, the sum returns `int` values, and you want them to be either `1` or `0` (at row B, col C), so `1+1 = 2 -> 1`. – Quang Hoang Jul 12 '19 at 19:48
  • I see, thank you. To my understanding, gt(0) does the same thing as replace(_>1_,1) in this situation. The pandas documentation page for gt isn't very clear to me. – Ethan Li Jul 12 '19 at 19:54
  • Yes, `gt(0)` checks if each value is **g**reater **t**han `0` or not. Similarly, there are `ge`, `lt`, `le`, `eq`, and `ne`. Can you guess what they mean :-) – Quang Hoang Jul 12 '19 at 19:56
  • @QuangHoang `fillna(0)` not necessary since you check if `> 0`. That evaluates to `False` for `np.nan` too. So `df1.add(df2, fill_value=0).gt(0).astype(int)` – piRSquared Jul 12 '19 at 21:30