2

The data below is based on GPS coordinates of a van, whether the ignition was on/off, and how far the van was from a target location at a given time. I want to determine whether a van was at or near a location (<300), whether the ignition was turned off, and if both conditions are true, the time duration of the stay.

In the example below, I visualize rows 1-4 as being "grouped" together since they are consecutive rows where the distance was <300. Row 5 is "grouped" on its own since it was >300, and rows 6-8 are "grouped" together since they are consecutive rows with distance <300.

Accordingly, since the ignition was turned off in rows 1-4, I want to calculate the time duration (since the van "stopped" at the location for a given amount of time). However, the other two groups (row 5 and rows 6-8) should not have a time duration calculation since the ignition was never turned off in those groupings.

df
AcctID   On_Off    Distance  Timestamp
123      On        230       12:00
123      On        30        12:02
123      Off       29        12:05
123      Off       35        12:10
123      On        3000      12:13
123      On        100       12:20
123      On        95        12:22
123      On        240       12:28

I'm able to classify whether the Distance is less than 300 (Within_Distance), but determining whether the ignition was off for at least one of the rows in the grouping has me stumped. Here's what the final dataframe should look like:

df['Within_Distance'] = np.where(df['Distance']<300, "Yes", "No")

df
AcctID   On_Off    Distance  Timestamp   Within_Distance    Was_Off    Within_Distance_and_Was_Off
123      On        230       12:20       Yes                Yes        Yes
123      On        30        12:02       Yes                Yes        Yes
123      Off       29        12:05       Yes                Yes        Yes
123      Off       35        12:10       Yes                Yes        Yes
123      On        3000      12:13       No                 No         No
123      On        100       12:20       Yes                No         No
123      On        95        12:22       Yes                No         No
123      On        240       12:28       Yes                No         No

Thanks in advance!

Walt Reed
  • 1,336
  • 2
  • 17
  • 26

2 Answers2

3

Let's try:

df['Within_Distance'] = np.where(df['Distance']<300, "Yes", "No")

df['Was_Off'] = df.groupby((df.Distance > 300).diff().fillna(0).cumsum())['On_Off'].transform(lambda x: 'Yes' if (x == 'Off').any() else 'No')

df['Within_Distinace_and_Was_Off']  = np.where((df['Within_Distance'] == 'Yes') & (df['Was_Off'] == 'Yes'),'Yes','No')

Output:

   AcctID On_Off  Distance Timestamp Within_Distance Was_Off  \
0     123     On       230     12:00             Yes     Yes   
1     123     On        30     12:02             Yes     Yes   
2     123    Off        29     12:05             Yes     Yes   
3     123    Off        35     12:10             Yes     Yes   
4     123     On      3000     12:13              No      No   
5     123     On       100     12:20             Yes      No   
6     123     On        95     12:22             Yes      No   
7     123     On       240     12:28             Yes      No   

  Within_Distinace_and_Was_Off  
0                          Yes  
1                          Yes  
2                          Yes  
3                          Yes  
4                           No  
5                           No  
6                           No  
7                           No  
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

First, set up a boolean field to work with

df['Off'] = df['On_Off'] == 'Off'

Then construct a field that identifies consecutive rows for groupby, as shown here

(df['Within_Distance'] != df['Within_Distance'].shift()).cumsum()

And use .any to identify where the boolean is true for any row in the groupby:

df['Was_Off'] = df.groupby((df['Within_Distance'] != df['Within_Distance'].shift()).cumsum())['Off'].transform(any)
Out[31]: 
   AcctID On_Off  Distance Timestamp Within_Distance    Off  Was_Off
0     123     On       230     12:00             Yes  False     True
1     123     On        30     12:02             Yes  False     True
2     123    Off        29     12:05             Yes   True     True
3     123    Off        35     12:10             Yes   True     True
4     123     On      3000     12:13              No  False    False
5     123     On       100     12:20             Yes  False    False
6     123     On        95     12:22             Yes  False    False
7     123     On       240     12:28             Yes  False    False
EFT
  • 2,359
  • 1
  • 10
  • 11
  • This almost worked for me. Unfortunately, `Was_Off` returned True when `Within_Distance == 'No'` and `Off == 'False'`, I only wanted `Was_Off` to return True when `Within_Distance == 'Yes'` and `Off == 'True'`. Thanks for your help! – Walt Reed Jun 20 '17 at 16:28
  • Was there a row it was grouped with where `Within_Distance == 'No'` and `Off == True`? If not, that's really odd. (If so, that's what `Within_Distance_and_Was_Off ` is for.) Could you post a sample dataframe where that occurs, so I can puzzle at it more? – EFT Jun 20 '17 at 16:35
  • `AcctId On_Off Distance Within_Distance Off Was_Off 123 Off 350 No True True 123 Off 420 No True True 123 On 100 No False False` – Walt Reed Jun 20 '17 at 19:42
  • @WaltReed besides the last `Within_Distance` value, which looks like it should be `Yes` there, what looks wrong? – EFT Jun 20 '17 at 19:48
  • In the instance where `Within_Distance` == 'No' and `Off` == 'False, `Was_Off` should equal "False" and not "True" – Walt Reed Jun 20 '17 at 20:08
  • @WaltReed It was adjacent to other instances of `Within_Distance == 'No'` for which `Off == True`, and so was grouped with them. It seems to me that the behavior you describe is more appropriate to the `Within_Distance_and_Was_Off` field, which I did not include, since it didn't seem to be the focus of your question. For comparison with an undisputedly correct solution, @ScottBoston's answer behaves similarly if you adjust the `Distance` field for the third row of your example to be greater than 300 and look at the `Was_Off` field. – EFT Jun 20 '17 at 20:33