0

Another attempt to calculate an 'elapsedSeconds' column based upon a condition applied to the previous 'payArea' value.

With this dataframe:

> 0  location_day  timestamp            payArea    Name   ratePay  elapsedSeconds  
> 1   2019-12-10  2019-12-10 12:00:01    Area1     Amy    12.25       365    
> 2   2019-12-10  2019-12-10 12:00:06    Area1     Amy    12.25         5  
> 3   2019-12-10  2019-12-10 12:00:16    Area1     Amy    12.25        10 
> 4   2019-12-17  2019-12-17 15:00:16    Area2     Amy    12.25     10800
> 5   2019-12-17  2019-12-17 15:00:36    Area2     Amy     8.75        20
> 6   2019-12-17  2019-12-17 15:00:40    Area2     Amy    12.25         4  
> 7   2019-12-17  2019-12-17 15:00:50    Area2     Amy     8.75        10
> 8   2019-12-17  2019-12-17 15:01:10    Area1     Amy    12.25       380   
> 9   2019-12-17  2019-12-17 15:01:45    Area1     Amy    12.25        35  
> 10  2019-12-17  2019-12-17 15:02:01    Area1     Amy    12.25        16 

Desired logic to apply: IF 'payArea' is equal to the previous 'payArea' value THEN calculate difference between timestamp in seconds, else if 'payArea' is not equal to previous 'payArea' value then 'elapsedSeconds' = 0.

This should basically "reset" the elapsedSeconds = 0 at any point the condition is NOT met.

Desired output:

> 0  location_day  timestamp            payArea    Name   ratePay  elapsedSeconds  
> 1   2019-12-10  2019-12-10 12:00:01    Area1     Amy    12.25         0    
> 2   2019-12-10  2019-12-10 12:00:06    Area1     Amy    12.25         5  
> 3   2019-12-10  2019-12-10 12:00:16    Area1     Amy    12.25        10 
> 4   2019-12-17  2019-12-17 15:00:16    Area2     Amy    12.25         0
> 5   2019-12-17  2019-12-17 15:00:36    Area2     Amy     8.75        20
> 6   2019-12-17  2019-12-17 15:00:40    Area2     Amy    12.25         4  
> 7   2019-12-17  2019-12-17 15:00:50    Area2     Amy     8.75        10
> 8   2019-12-17  2019-12-17 15:01:10    Area1     Amy    12.25         0 
> 9   2019-12-17  2019-12-17 15:01:45    Area1     Amy    12.25        35  
> 10  2019-12-17  2019-12-17 15:02:01    Area1     Amy    12.25        16 

I was previously calculating the elapsedSeconds with this,

#first get the elapsedSeconds as datetime
df['elapsedSeconds'] = df.sort('timestamp').groupby(['name','payArea'])['timestamp'].diff()/1000

#now convert the diff value as float type
df['elapsedSeconds'] = ((df['elapsedSeconds'] / np.timedelta64(1, 's')) *1000).astype(float)
user2309282
  • 121
  • 1
  • 9
  • It looks close but there doesn't seem to be any condition being applied in that example. That is in my desired output you can see "Area1" repeats again after "Area2" since the df is sorted on timestamp. – user2309282 Feb 03 '20 at 20:45
  • For now, I have stepped back and applied the desired logic to the source data (JSON) and can move on to summarizing the output as needed. – user2309282 Feb 03 '20 at 22:04

0 Answers0