0

I have a below dataframe and want to count the number of time the value got changed over time.

input dataframe:

class  date         value
A      2019-01-02   80
A      2019-02-02   80
A      2019-03-02   90
A      2019-04-02   20
A      2019-05-02   80
A      2019-06-02   Null
A      2019-06-03   70
A      2019-06-04   70
A      2019-06-05   20
B ...

output dataframe as below:

class count_of_val
A      6              

reason: (80,90,20,80, null,70, 20)

user3222101
  • 1,270
  • 2
  • 24
  • 43

3 Answers3

2

IIUC, use:

(df.groupby('class', sort=False)['value']
 .apply(lambda x: (x != x.shift()).sum()-1)
 .reset_index(name='count_of_val'))

[out]

  class  count_of_val
0     A             6
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
2

You can use the diff() function of pandas-DataFrame

df['count_of_val']=np.where((df['value'].diff()).fillna(method="bfill")!=0,1,0)
df['count_of_val'].loc[df['class']=='A'].sum()

Output is:

6

Or if you like DataFrames:

df['count_of_val']=np.where((df['value'].diff()).fillna(method="bfill")!=0,1,0)
desired_class = 'A'
df_count = pd.DataFrame(columns = ['class', 'count_of_val'],
                        data = [[desired_class, df['count_of_val'].loc[df['class']==desired_class].sum()]])
df_count

Output:

      class count_of_val
0       A        6
coco18
  • 836
  • 8
  • 18
0

Compute the rolling difference

diff_kernel = np.array([1,-1])
df['change'] = df.groupby('class', as_index=False)['value'].transform(lambda s: np.array(np.convolve(s, diff_kernel ,'same'), dtype=bool))

Then you can sum it as bool:

change_sum = df.groupby('class')['change'].sum()
Oleg O
  • 1,005
  • 6
  • 11
  • This is so slow... – ansev Mar 03 '20 at 11:27
  • Maybe the native pd.rolling is faster? `df['change'] = df.groupby('class', as_index=False)['value'].transform(lambda s: s.rolling(window=2).apply(lambda x: x[1] - x[0])).astype(bool)` – Oleg O Mar 03 '20 at 11:30
  • this does not work if the values ​​are string or contain the word null – ansev Mar 03 '20 at 11:38
  • I think these options are somewhat slower for the sample data framework. With larger data frames they would be much slower – ansev Mar 03 '20 at 11:38