how to calculate change of values over time in pandas?

Question

I have a below dataframe and want to count the number of time the value got changed over time.

input dataframe:

class  date         value
A      2019-01-02   80
A      2019-02-02   80
A      2019-03-02   90
A      2019-04-02   20
A      2019-05-02   80
A      2019-06-02   Null
A      2019-06-03   70
A      2019-06-04   70
A      2019-06-05   20
B ...

output dataframe as below:

class count_of_val
A      6

reason: (80,90,20,80, null,70, 20)

The value changes only 4 times and not 5 – coco18 Mar 03 '20 at 13:25 — coco18, Mar 03 '20 at 13:25

Chris Adams · Accepted Answer · 2020-03-03T16:31:07.747

2

IIUC, use:

(df.groupby('class', sort=False)['value']
 .apply(lambda x: (x != x.shift()).sum()-1)
 .reset_index(name='count_of_val'))

[out]

  class  count_of_val
0     A             6

edited Mar 03 '20 at 16:31

answered Mar 03 '20 at 10:57

Chris Adams

18,389
4
22
39

I think `cumsum` not neccesary here – ansev Mar 03 '20 at 11:01
1

`df.groupby('class')['value'].apply(lambda x: x.ne(x.shift()).sum())` see my answer, `cumsum().max()` is only `sum()`... – ansev Mar 03 '20 at 11:03
ah good point .. updated thanks – Chris Adams Mar 03 '20 at 11:05

coco18 · Answer 2 · 2020-03-03T17:58:40.017

2

You can use the diff() function of pandas-DataFrame

df['count_of_val']=np.where((df['value'].diff()).fillna(method="bfill")!=0,1,0)
df['count_of_val'].loc[df['class']=='A'].sum()

Output is:

Or if you like DataFrames:

df['count_of_val']=np.where((df['value'].diff()).fillna(method="bfill")!=0,1,0)
desired_class = 'A'
df_count = pd.DataFrame(columns = ['class', 'count_of_val'],
                        data = [[desired_class, df['count_of_val'].loc[df['class']==desired_class].sum()]])
df_count

Output:

      class count_of_val
0       A        6

edited Mar 03 '20 at 17:58

answered Mar 03 '20 at 13:12

coco18

836
8
18

output is wrong – ansev Mar 03 '20 at 14:22
Output is not wrong, if you count the changes in value, you have 4 changes, which is the desired output. Otherwise the question has to be edited... – coco18 Mar 03 '20 at 14:25

score 0 · Answer 3 · answered Mar 03 '20 at 11:06

0

Compute the rolling difference

diff_kernel = np.array([1,-1])
df['change'] = df.groupby('class', as_index=False)['value'].transform(lambda s: np.array(np.convolve(s, diff_kernel ,'same'), dtype=bool))

Then you can sum it as bool:

change_sum = df.groupby('class')['change'].sum()

answered Mar 03 '20 at 11:06

Oleg O

1,005
6
11

This is so slow... – ansev Mar 03 '20 at 11:27
Maybe the native pd.rolling is faster? `df['change'] = df.groupby('class', as_index=False)['value'].transform(lambda s: s.rolling(window=2).apply(lambda x: x[1] - x[0])).astype(bool)` – Oleg O Mar 03 '20 at 11:30
this does not work if the values are string or contain the word null – ansev Mar 03 '20 at 11:38
I think these options are somewhat slower for the sample data framework. With larger data frames they would be much slower – ansev Mar 03 '20 at 11:38

how to calculate change of values over time in pandas?

3 Answers3