1

I would like to have average value and max value in every positive and negative range. From sample data below:

import pandas as pd
test_list = [-1, -2, -3, -2, -1, 1, 2, 3, 2, 1, -1, -4, -5, 2 ,4 ,7  ]
df_test = pd.DataFrame(test_list, columns=['value'])

Which give me dataframe like this:

    value
0      -1
1      -2
2      -3
3      -2
4      -1
5       1
6       2
7       3
8       2
9       1
10     -1
11     -4
12     -5
13      2
14      4
15      7

I would like to have something like that:

AVG1 = [-1, -2, -3, -2, -1] / 5 = - 1.8
Max1 = -3
AVG2 = [1, 2, 3, 2, 1] / 5 = 1.8 
Max2 = 3
AVG3 = [2 ,4 ,7] / 3 =  4.3
Max3 = 7

If solution need new column or new dataframe, that is ok for me.

I know that I can use .mean like here pandas get column average/mean with round value But this solution give me average from all positive and all negative value.

How to build some kind of window that I can calculate average from first negative group next from second positive group and etc..

Regards

luki
  • 197
  • 11
  • I know also that I can iterate thru column and check positive and negative value and next create some lists and next calculate all what I want but I would like to know better way using Pandas and maybe Numpy. – luki May 26 '20 at 10:24

1 Answers1

2

You can create Series by np.sign for distinguish positive and negative groups with compare shifted values with cumulative sum for groups and then aggregate mean and max:

s = np.sign(df_test['value'])
g = s.ne(s.shift()).cumsum()
df = df_test.groupby(g)['value'].agg(['mean','max'])
print (df)
           mean  max
value               
1     -1.800000   -1
2      1.800000    3
3     -3.333333   -1
4      4.333333    7

EDIT:

For find locale extremes is used solution from this answer:

test_list = [-1, -2, -3, -2, -1, 1, 2, 3, 2, 1, -1, -4, -5, 2 ,4 ,7  ]
df_test = pd.DataFrame(test_list, columns=['value'])

from scipy.signal import argrelextrema

#https://stackoverflow.com/a/50836425
n=2 # number of points to be checked before and after 
# Find local peaks
df_test['min'] = df_test.iloc[argrelextrema(df_test.value.values, np.less_equal, order=n)[0]]['value']
df_test['max'] = df_test.iloc[argrelextrema(df_test.value.values, np.greater_equal, order=n)[0]]['value']

Then are replaced values after extremes to missing values, separately for negative and positive groups:

s = np.sign(df_test['value'])
g = s.ne(s.shift()).cumsum()

df_test[['min1','max1']] = df_test[['min','max']].notna().astype(int).iloc[::-1].groupby(g[::-1]).cumsum()
df_test['min1'] = df_test['min1'].where(s.eq(-1) & df_test['min1'].ne(0))
df_test['max1'] = df_test['max1'].where(s.eq(1) & df_test['max1'].ne(0))

df_test['g'] = g
print (df_test)
    value  min  max  min1  max1  g
0      -1  NaN -1.0   1.0   NaN  1
1      -2  NaN  NaN   1.0   NaN  1
2      -3 -3.0  NaN   1.0   NaN  1
3      -2  NaN  NaN   NaN   NaN  1
4      -1  NaN  NaN   NaN   NaN  1
5       1  NaN  NaN   NaN   1.0  2
6       2  NaN  NaN   NaN   1.0  2
7       3  NaN  3.0   NaN   1.0  2
8       2  NaN  NaN   NaN   NaN  2
9       1  NaN  NaN   NaN   NaN  2
10     -1  NaN  NaN   1.0   NaN  3
11     -4  NaN  NaN   1.0   NaN  3
12     -5 -5.0  NaN   1.0   NaN  3
13      2  NaN  NaN   NaN   1.0  4
14      4  NaN  NaN   NaN   1.0  4
15      7  NaN  7.0   NaN   1.0  4

So is possible separately aggregate last 3 values per groups with lambda function and mean, rows with missing values in min1 or max1 are removed by default in groupby:

df1 = df_test.groupby(['g','min1'])['value'].agg(lambda x: x.tail(3).mean())
print (df1)
g  min1
1  1.0    -2.000000
3  1.0    -3.333333
Name: value, dtype: float64

df2 = df_test.groupby(['g','max1'])['value'].agg(lambda x: x.tail(3).mean())
print (df2)
g  max1
2  1.0     2.000000
4  1.0     4.333333
Name: value, dtype: float64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank You Master. You have always great answare. – luki May 26 '20 at 10:52
  • Ok. Can i upgrade my question? If I can find for example Max positive value in first positive range, which gave me value 3 Is that possible calculate average value for this max value and two before? That means New AVG = [3 + 2 + 1] / 3 = 2 ? – luki May 26 '20 at 10:55
  • I would love to accept, but I have a low reputation. For me your answare is 100% useful and helpful. – luki May 26 '20 at 11:04
  • ```Ok. Can i upgrade my question? If I can find for example Max positive value in first positive range, which gave me value 3 Is that possible calculate average value for this max value and two before? That means New AVG = [3 + 2 + 1] / 3 = 2 ?``` this situation will repeat on each one max value on each positive and negative group. Sorki i did not read carfully all info from your link – luki May 26 '20 at 11:11
  • 1
    ```@luki - no problem. But I want ask what is expected output for negative values?``` for negative values: New AVG = [-3 -2 -1] / 3 = -2 for first negative group and for next the same. Generally I am trying to solve situation: I have to check if next value after ours MAX is less then AVG from [MAX, MAX-1, MAX-2] ( form 3 values before). in our example (I will write only on first positive group) I have to check equation 2(from index 8) < (3+2+1) / 3 – luki May 26 '20 at 11:19
  • ```@luki - hmmm, for first groups are maximum -1, twice. It seems you want use second -1, because there are 2 previous values. So is possible say if more values and not exist previous then are not counts like first -1 in first group? And what with 2nd negative group? there is max -1, but no previous negative values. How is processing this situation?``` Here is a small misunderstanding. I am thinking about local extremum. That means that in first group ours max for me is -3 as local extremum. If local extremum will not have previous value, we can omit it – luki May 26 '20 at 11:34
  • sorry for this misunderstanding – luki May 26 '20 at 11:35
  • And maybe for you important information. As index it can be time – luki May 26 '20 at 11:38
  • ``` there is always only one extreme per group?``` unfortunately not. Data can have random number local extremums <1 ; Infinities). Good real example for imagine situation is a temperature chart in time frame 5 years. We have minus and plus temperature. We have local extrema in summer time and local extrema in winter time. As extrema i am thinking about local max values when temperature er plus and local min when temperature er minus. Do not rush with an answer. I am still analyzing your first post and i am trying understand it :) – luki May 26 '20 at 12:09
  • @luki - Edited answer, I think now it is more complicated. Please check it. – jezrael May 26 '20 at 12:41
  • ```s = np.sign(df_test['value'])``` ```g = s.ne(s.shift()).cumsum()``` very clever thouse two lines :) – luki May 26 '20 at 12:44
  • ```@luki - Edited answer, I think now it is more complicated. Please check it.``` ok. I will do it but i will need much more time for understand. I just finished with your first very clever answer. – luki May 26 '20 at 12:48
  • Pleas help me to understood one thing. ```n=2 # number of points to be checked before and after``` How it is working. https://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.signal.argrelextrema.html say that ```order : int, optional - How many points on each side to use for the comparison to consider comparator(n, n+x) to be True. l``` I changed test data for script from this post https://stackoverflow.com/a/50836425. Data er [-1,-2,-3,-4,-3,-4,-3,-5,-4,-5,-4, -6,-3,-1]. I changed n in set <1;4> but I can not see the rules. I know only that for n=1 it return all extrema. Regar – luki May 26 '20 at 17:06
  • @luki I am offline, on phone only, and honestly not idea. Maybe the best ask new question about it – jezrael May 26 '20 at 17:47