1

I am looking at the velocity of a vehicle and the only data I have is that the velocity is stable, slowing down or stopped (see df below). There is another one (speeding up) but this one is not found in the current df.

As you can see there are 2 "slowing down" periods. I am only interested in the data starting from the last "slowing down" period before stopped.

How do I filter the data in such a way that I can drop the first x rows that I am not interested in? Since the velocity values are always different, I cannot simply filter on values.

Hope you can help!

import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20"],

  "Values": ["Stable","Stable","Stable","Stable", "Slowing down","Slowing down","Slowing down","Stable", "Stable", "Stable", "Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down", "Stopped", "Stopped", "Stopped"]
}

df = pd.DataFrame(data)

df.head()
Mel
  • 103
  • 7
  • This should be useful https://stackoverflow.com/questions/40900195/pandas-cumulative-count – FloLie May 31 '21 at 08:34
  • So in other words you want the time/rows from when "Slowing Down" begins until it is "Stopped"? – chitown88 May 31 '21 at 08:37
  • @chitown88 Yes indeed, but only the last "slowing down" before "stopped", so ignoring the one that was there before (or multiple if that happens in another df). – Mel May 31 '21 at 08:40
  • @FloLie I have seen the cumulative count questions, but I am not interested in the first "slowing down' period at all, making cumcount irrelevant in this case. Thanks for commenting though! – Mel May 31 '21 at 08:41
  • You could use the cumcount, to find the last one, by getting the row number of the max value in a "slowing down" row – FloLie May 31 '21 at 08:45
  • Do you also want the "Stop" that follows the last "Slowing Down"? Or do you just want the "Slowing Down"? (Just trying to understand the requirements) – chitown88 May 31 '21 at 08:48

3 Answers3

1

You can get the sequence of slowing down period by using .cumsum() then filter by .loc with Values equals Slowing down and the newly created sequence is the max:

df['SlowDownSeq'] = df['Values'].ne(df['Values'].shift()).cumsum()
df_selected = df.loc[df['SlowDownSeq'] ==  df.loc[df['Values'] == 'Slowing down', 'SlowDownSeq'].max()].drop('SlowDownSeq', axis=1)

Result:

print(df_selected)


         Date and Time        Values
10 2020-06-07 00:10:00  Slowing down
11 2020-06-07 00:11:00  Slowing down
12 2020-06-07 00:12:00  Slowing down
13 2020-06-07 00:13:00  Slowing down
14 2020-06-07 00:14:00  Slowing down
15 2020-06-07 00:15:00  Slowing down
16 2020-06-07 00:16:00  Slowing down
17 2020-06-07 00:17:00  Slowing down
SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • Is there a way to also include the "stopped" values using this method? i.e. only delete everything previous to row 10, but keep everything after the slowing down part is finished? – Mel May 31 '21 at 09:24
  • 1
    @Mel Slightly fine-tuned the logics above. You can see the updated version above. For your follow-up question, yes, it's easy, just change a `==` to `>=` in the code above. i.e. `df_selected = df.loc[df['SlowDownSeq'] >= df.loc[df['Values'] == 'Slowing down', 'SlowDownSeq'].max()].drop('SlowDownSeq', axis=1)` – SeaBean May 31 '21 at 09:30
1

So from my understanding, you want the rows where 'Value' is 'Slowing down', followed immediately by 'Stopped'.

*Note: I see I misunderstood. You not only want that last row, but all previous sequential rows that start the "Slowing down" sequence. I'll still leave this solution up, but looks like SeaBean got you what you needed.

What you can do is create another column, which I called 'Next_Value' that is a shift up 1 row. then you can do a query/filter and find the rows that have a 'Value' == 'Slowing down' and 'Next_Value' == 'Stopped'

import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20"],

  "Values": ["Stable","Stable","Stable","Stable", "Slowing down","Slowing down","Slowing down","Stable", "Stable", "Stable", "Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down", "Stopped", "Stopped", "Stopped"]
}

df = pd.DataFrame(data)

df['Next_Value'] = df['Values'].shift(-1)

filtered_df = df.query('Values == "Slowing down" and Next_Value == "Stopped"')

If you are more familiar with this syntax rather than df.query(), use the below line:

filtered_df = df[(df['Values'] == "Slowing down") & (df['Next_Value'] == "Slowing down")]

Output:

print(filtered_df)
       Date and Time        Values Next_Value
17  2020-06-07 00:17  Slowing down    Stopped
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • 1
    Thanks for your input! I think the above solution is slightly easier for the dataframe, but I will definitely keep this idea in mind as well :) – Mel May 31 '21 at 09:11
0
import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20"],

  "Values": ["Stable","Stable","Stable","Stable", "Slowing down","Slowing down","Slowing down","Stable", "Stable", "Stable", "Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down", "Stopped", "Stopped", "Stopped"]
}

df = pd.DataFrame(data)

df["slow_count"] = df.groupby("Values").cumcount()

a = df[(df["slow_count"] == df["slow_count"].max()) & (df["Values"] == "Slowing down" )]


FloLie
  • 1,820
  • 1
  • 7
  • 19