The question that this was marked as a duplicate of does not answer the question. I have already grouped the results by consecutive values successfully in my example below. My question is how to populate a new column with the first value of each group. Notice that state_start_value is not consecutive and so the linked answer does not apply.
I have a DataFrame with 2 columns. One for the value and one for state. I need to add another column which has the initial value at the start of the state, but I can't figure out how to do it.
from this:
df
value state
0 1 0
1 2 0
2 3 -1
3 4 -1
4 5 -1
5 6 0
6 7 1
7 8 0
8 9 0
I need:
value state state_start_value
0 1 0 1
1 2 0 1
2 3 -1 3
3 4 -1 3
4 5 -1 3
5 6 0 6
6 7 1 7
7 8 0 8
8 9 0 8
I've tried adding column called state_count, so I can then group by that and get the initial state of each group. The resulting numbers are correct, they don't line up with the current rows. And there must be a better way.
df['state_count'] = (df.state.diff() != 0).cumsum()
df['state_start_value'] = df.groupby('state_count')['value'].first()
yields
df
value state state_count state_start_value
0 1 0 1 NaN
1 2 0 1 1.0
2 3 -1 2 3.0
3 4 -1 2 6.0
4 5 -1 2 7.0
5 6 0 3 8.0
6 7 1 4 NaN
7 8 0 5 NaN
8 9 0 5 NaN