1

I would like to match two columns, and if all of them are True. I need to drop the subset from a very large dataset.

I am running into 2 problem:

  1. when rowA.exprityDate = NaN and rowB.exprityDate = NaN, my condition returns false. I believe its's because NaN is a special condition. Is there any way around this. My matching condition state below:
  2. Is there any way to conduct a cross matching condition? rowA.baseCptyId[0] == rowB.extCptyId[1] and rowA.extcptyId[1] == RowB.extcptyID[0]...It's a bit more clearer using the table below where the Matched column is False for the first two rows.

Here is the matching condtion I am using:

    combine_series = pd.DataFrame(dict(rowA = rowA, rowB = rowB))
    combine_series['Matched'] = np.where(combine_series['rowA'] == combine_series['rowB'], True, False)

Here is the resulting Matched Column

                                    rowA          rowB Matched
baseCptyID                       2231200          5900   False
extCptyID                           5900       2231200   False
notional                         3.4e+07       3.4e+07    True
startDate                     2015-05-29    2015-05-29    True
expiryDate                           NaN           NaN   False
settlementDate                2020-06-29    2020-06-29    True
rate                             0.03375       0.03375    True
spread                               NaN           NaN   False
paymentFreq                           PA            PA    True
resetFreq                             PA            PA    True
modelUsed                       FixedLeg      FixedLeg    True
PayoutCCY                            AUD           AUD    True
DayCountConv                ACT/ACT ICMA  ACT/ACT ICMA    True
join_column                      2231200       2231200    True
Karun
  • 561
  • 7
  • 23
  • Possible duplicate of [Pandas DataFrames with NaNs equality comparison](http://stackoverflow.com/questions/19322506/pandas-dataframes-with-nans-equality-comparison) – kennytm May 07 '17 at 14:03

1 Answers1

0

Let's try this logic that takes advangate of NaN != NaN resolves as True.

df['Matched']=(df.rowA == df.rowB) | ((df.rowA != df.rowA) & (df.rowB != df.rowB))

Output:

                       rowA          rowB Matched
baseCptyID           2231200          5900   False
extCptyID               5900       2231200   False
notional             3.4e+07       3.4e+07    True
startDate         2015-05-29    2015-05-29    True
expiryDate               NaN           NaN    True
settlementDate    2020-06-29    2020-06-29    True
rate                 0.03375       0.03375    True
spread                   NaN           NaN    True
paymentFreq               PA            PA    True
resetFreq                 PA            PA    True
modelUsed           FixedLeg      FixedLeg    True
PayoutCCY                AUD           AUD    True
DayCountConv    ACT/ACT ICMA  ACT/ACT ICMA    True
join_column          2231200       2231200    True
Scott Boston
  • 147,308
  • 15
  • 139
  • 187