I have a dataset of assignments per case. I'm trying to fill the last unassigned
from each case with the case_closed
date.
case_nb attorney cas_closed_date assigned last_event
1 A 2015-07-02 2015-07-02 2015-07-02
2 B 2015-09-10 2015-09-10 2015-09-10
3 C 2016-03-24 2016-03-24 2016-03-24
4 D 2018-06-07 2013-10-21 2014-02-06
4 C 2018-06-07 2013-09-13 2013-09-13
4 F 2018-06-07 2018-03-31 2018-04-05
(For replication:
df = pd.DataFrame({'assigned': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10', 2: '2016-03-24', 3: '2013-10-21', 4: '2013-09-13', 5: '2018-03-31'},
'attorney': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'C', 5: 'F'},
'cas_closed_date': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10', 2: '2016-03-24', 3: '2018-06-07', 4: '2018-06-07', 5: '2018-06-07'},
'case_nb': {0: 1, 1: 2, 2: 3, 3: 4, 4: 4, 5: 4},
'last_event': {0: pd.Timestamp('2015-07-02'), 1: '2015-09-10', 2: '2016-03-24', 3: '2014-02-06', 4: '2013-09-13', 5: '2018-04-05'}}
)
My dataset includes the case number, attorney assigned, the case closed date, the date the attorney was assigned, and the last event the attorney appeared on. For the above example, I'd want the last line to be
4 F 2018-06-07 2018-03-31 2018-06-07
I've seen a few ways of filling in NAs based on data in the same column, such as this question. But these ways use transform
, which I can't get to use multiple columns.
What I have so far works with apply
, but not transform
:
def fixdate(gp):
last_unasgn = gp.iloc[-1]['last_event']
if gp.iloc[-1]['cas_closed_date'] > last_unasgn:
return gp.iloc[-1]['cas_closed_date']
else:
return last_unasgn
asmt.groupby('evt_file_number').apply(lambda x: fixdate(x))
> 4 2018-06-07
Is there a way to use transform
? If not, what's the best way to use that aggregate data to fill in my original dataset?