I'm in so far over my head. Could get this with R but no idea with Python (which I'm learning).
So...I have a data frame with participants who are sampled at four times each day for 30 days. The issue is that if the participant filled out two surveys within a certain range, the time_of_day for both could be 2; so there are times when time_of_day has the same value for the same participant and same date (which it shouldn't). I checked with the director of the lab and this is because the inputs weren't time-locked. So to resolve this, I think I'd need to create some kind of ifelse statement, if there is a duplicate to make it the preceding value (if there is a duplicate for 2, make the first value for time_of_day a 1). I don't know how to do this within the lambda/if else statement I created below, but where I'm totally in over my head is that these would need to be grouped by participant and date as well, and I have no idea how to combine an if else statement with dropping duplicates within a lambda statement in Python.
data\
.assign(time_of_day = data['time_of_day'].apply(lambda x: if else )
So for the data below, for time_of_day for the participant 21 for 2019-12-08 date, the 1st observation (currently a 2) would become a 1, and the 3rd observation (currently a 4) would become a 3.
I couldn't find the equivalent of dput() in R, so I've converted a minimal reproducible data frame to a dictionary (which I read about here: Print pandas data frame for reproducible example (equivalent to dput in R))...so just convert it to a dataframe with data = pd.DataFrame.from_dict(df_dict)
:
df_dict = {'pid': {0: '21',
1: '21',
2: '21',
3: '21',
4: '21',
5: '21',
200: '26',
201: '26',
202: '26',
203: '26',
204: '26'},
'datestamp': {0: Timestamp('2019-12-07 21:33:11'),
1: Timestamp('2019-12-08 13:32:16'),
2: Timestamp('2019-12-08 13:33:41'),
3: Timestamp('2019-12-08 19:54:22'),
4: Timestamp('2019-12-08 19:55:24'),
5: Timestamp('2019-12-09 12:11:24'),
200: Timestamp('2020-02-12 20:16:33'),
201: Timestamp('2020-02-13 08:37:21'),
202: Timestamp('2020-02-13 13:24:20'),
203: Timestamp('2020-02-13 17:05:27'),
204: Timestamp('2020-02-13 20:02:12')},
'date': {0: datetime.date(2019, 12, 7),
1: datetime.date(2019, 12, 8),
2: datetime.date(2019, 12, 8),
3: datetime.date(2019, 12, 8),
4: datetime.date(2019, 12, 8),
5: datetime.date(2019, 12, 9),
200: datetime.date(2020, 2, 12),
201: datetime.date(2020, 2, 13),
202: datetime.date(2020, 2, 13),
203: datetime.date(2020, 2, 13),
204: datetime.date(2020, 2, 13)},
'time_of_day': {0: 4,
1: 2,
2: 2,
3: 4,
4: 4,
5: 2,
200: 4,
201: 1,
202: 2,
203: 3,
204: 4},
'depressed': {0: 3,
1: 3,
2: 4,
3: 4,
4: 4,
5: 4,
200: 3,
201: 3,
202: 1,
203: 2,
204: 2},
'prev_night_sleep': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 11.35,
200: 7.166666666666667,
201: 10.18333333333333,
202: 10.18333333333333,
203: 10.18333333333333,
204: 10.18333333333333}}