Using Pandas, how can we copy the initial value of a state to each row as a new column

Question

The question that this was marked as a duplicate of does not answer the question. I have already grouped the results by consecutive values successfully in my example below. My question is how to populate a new column with the first value of each group. Notice that state_start_value is not consecutive and so the linked answer does not apply.

I have a DataFrame with 2 columns. One for the value and one for state. I need to add another column which has the initial value at the start of the state, but I can't figure out how to do it.

from this:

df
    value   state
0   1   0
1   2   0
2   3   -1
3   4   -1
4   5   -1
5   6   0
6   7   1
7   8   0
8   9   0

I need:

    value   state   state_start_value
0   1   0   1
1   2   0   1
2   3   -1  3
3   4   -1  3
4   5   -1  3
5   6   0   6
6   7   1   7
7   8   0   8
8   9   0   8

I've tried adding column called state_count, so I can then group by that and get the initial state of each group. The resulting numbers are correct, they don't line up with the current rows. And there must be a better way.

df['state_count'] = (df.state.diff() != 0).cumsum()
df['state_start_value'] = df.groupby('state_count')['value'].first()

yields

df
    value   state   state_count state_start_value
0   1   0   1   NaN
1   2   0   1   1.0
2   3   -1  2   3.0
3   4   -1  2   6.0
4   5   -1  2   7.0
5   6   0   3   8.0
6   7   1   4   NaN
7   8   0   5   NaN
8   9   0   5   NaN

This is not a duplicate of that question. In my example I have already grouped by consecutive values. The question is how to populate a new column with the first value of each state group. — B. Bogart, Sep 08 '18 at 13:09

Andrey Portnoy · Accepted Answer · 2018-09-08T04:51:47.643

0

df['state_start_value'] = df.groupby((df.state != df.state.shift()).cumsum())['value'].transform('first')

Or to unpack it a bit:

change = df.state != df.state.shift()
cumulative_change = change.cumsum()
values_gb = df.groupby(cumulative_change)['value']
df['start_value'] = values_gb.transform('first')

The 'group by contiguous blocks' trick is borrowed from this answer by @jezrael.

edited Sep 08 '18 at 04:51

answered Sep 08 '18 at 04:45

Andrey Portnoy

1,430
15
24

Using Pandas, how can we copy the initial value of a state to each row as a new column

1 Answers1