1

The problem

Hello, I am trying to compare two Series element wise to get a Series with values "True" and "False". These are the two columns I am trying to compare:

    Loan        Date 1      Date2
405 1022    2020-02-29  2019-10-31
406 1022    2020-02-29  2019-11-30
407 1022    2020-02-29  2019-12-31
408 1022    2020-02-29  2020-01-31
405 1030    2020-05-31  2020-01-31
406 1030    2020-05-31  2020-02-29
407 1030    2020-05-31  2020-03-31
408 1030    2020-05-31  2020-04-30

What I want to achieve is:

For each Loan take the last row, if "Date 1" is equal to "Date 2", then leave "Date 2" as it is, otherwise make "Date 2" equal to "Date 1

My attempt

a = df[["Loan","Date 1"]].groupby("Loan").tail(1)
b = df[["Loan","Date 2"]].groupby("Loan").tail(1)

df["new_date"] = np.where(a==b,b,a)

Also tried

(a==b).any() and (a==b).all()

Error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

2 Answers2

1

Lets groupby on Loan and aggregate using tail, then use boolean indexing with loc to replace the values in Date2 where Date2 is not equal to Date1:

d = df.groupby('Loan').tail(1)
d.loc[d['Date1'].ne(d['Date2']), 'Date2'] = d['Date1']

     Loan      Date1      Date2
408  1022 2020-02-29 2020-02-29
408  1030 2020-05-31 2020-05-31
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
0

You can simply replace Date2 by Date1 unconditially to cirumvent the error and get the data:

import pandas as pd
from io import StringIO

csv_string = StringIO("""Loan        Date1      Date2
1022    2020-02-29  2019-10-31
1022    2020-02-29  2019-11-30
1022    2020-02-29  2019-12-31
1022    2020-02-29  2020-01-31
1030    2020-05-31  2020-01-31
1030    2020-05-31  2020-02-29
1030    2020-05-31  2020-03-31
1030    2020-05-31  2020-04-30""" )

df = pd.read_csv(csv_string, sep=" ", skipinitialspace=True)

grp = df.groupby(["Loan", "Date1"]).tail(1)
grp["Date2"] = grp["Date1"]

print(grp)

Output:

   Loan       Date1       Date2
3  1022  2020-02-29  2020-02-29
7  1030  2020-05-31  2020-05-31

See ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69