I have the following dataset:
ID AAA BBB CCC DDD
1234 {'2015-01-01': 1} {'2016-01-01': 1, {'2015-01-02': 1} {'2016-01-02': 1}
'2016-02-15': 2}
1235 {'2017-11-05': 1, {'2018-01-05': 1} NaN {'2017-01-06': 1}
'2018-06-05': 1}
In the cell, ‘key’ is the date when someone is hospitalized and ‘value’ is the number of days.
I need to create a new column for hospitalization ('Yes' or 'No').
The condition to be 'yes':
- The column [AAA or BBB] as well as the column [CCC or DDD] both should have filled-in dates.
- The date in the column [CCC or DDD] should be the next day of the date in the column [AAA or BBB].
For example, if [AAA or BBB] has a date of January 01, 2020. For 'yes', the date in [CCC or DDD] should be January 02, 2020.
Desired output:
ID AAA BBB CCC DDD Hospitalized
1234 {'2015-01-01': 1} {'2016-01-01': 1, {'2015-01-02': 1} {'2016-01-02': 1} Yes
'2016-02-15': 2}
1235 {'2017-11-05': 1, {'2018-01-05': 1} NaN NaN No
'2018-06-05': 1}
1236 {'2017-11-05': 1, {'2018-01-05': 1} NaN {'2018-01-06': 1} Yes
'2018-06-05': 1}
I have tried the following code, but this captures if the dates are present but doesn't capture the timestamp.
df['hospitalized'] = (df
.apply(lambda r: 'yes' if (1 if pd.notna(r.loc[['AAA', 'BBB']]).any() else 0) +
(1 if pd.notna(r.loc[['CCC', 'DDD']]).any() else 0) > 1
else 'no', axis=1))
Any suggestions would be appreciated. Thanks!