2

I'm really stuck. I have a dataframe with a column that goes like the following

Dailychange:
1
2
3
0
-1
-2
-3
1
2

I want to calculate number of positive and negative consecutively into two lists with the output pos[3,2] nutral[1] neg[3]. I have tried resolving it with a simple loop like

    # for i in symbol:
    #     if (symbol['Dailychange']>0):
    #         counter+=1
    #         cons_list.append(counter)
    #     else:
    #         counter=0
    #         cons_list.append(counter)
    # print(cons_list)

and this outputs an error, due to my if statement. Then I tried to use the where function

symbol['positive']=symbol.where(symbol['Dailychange']>0,'positive','Negative')

That didn't work out either. I really appreciate your help on this.

Longroadahead
  • 387
  • 3
  • 19

2 Answers2

4

We need a new para here , I created by using np.where

df['New']=np.where(df['Num']>0,'positive',np.where(df['Num']==0,'Nutral','Negative'))
s=df.groupby([df['New'],(df['New']!=df['New'].shift()).cumsum()]).size().reset_index(level=1,drop=True)
s
Out[41]: 
New
Negative    3
Nutral      1
positive    3
positive    2
dtype: int64

More info

(df['New']!=df['New'].shift()).cumsum()
Out[804]: 
0    1
1    1
2    1
3    2
4    3
5    3
6    3
7    4
8    4
Name: New, dtype: int32

(df['New']!=df['New'].shift())
Out[805]: 
0     True 
1    False
2    False
3     True # here is the status change 
4     True # here is the status change 
5    False  # those one do not change should carry over the same number as before 
6    False
7     True # here is the status change 
8    False
Name: New, dtype: bool

We consider the consecutively positive or negative as one group, once they changed they blone to next group

Also one more thing True + False =1

BENY
  • 317,841
  • 20
  • 164
  • 234
  • Hi really appreciate your answer. I get the first line, the key is using numpy instead of pandas. But I'm confuse about the syntax of the second line. How is cumsum come into play. Thanks again. – Longroadahead Feb 16 '18 at 20:13
  • @Longroadahead (df['New']!=df['New'].shift()).cumsum() , this is to find the consecutively: -), think about number 1,1,1,2,2,2,3,3,3,1,1,1, find the group of repeat pattern will be 000111222333 – BENY Feb 16 '18 at 20:39
  • I just can't wrap my head around it for some reason, my understanding is this. igroup by df['New'] and the condition is (df['New'] not equal to the next inline of cumulative sum) then .size find the number of element and reset_index is putting them together. I just don't get the cumulative sum part. Really appreciate your help on this!! – Longroadahead Feb 16 '18 at 22:36
  • @Longroadahead add more info – BENY Feb 16 '18 at 22:42
  • Isn't cumulative sum adds value after another? ie: 1+1+1 etc. I must misunderstood the doc somehow. – Longroadahead Feb 16 '18 at 23:03
  • 1
    @Longroadahead yess it will , for example , T F T F , cumsum will be 1122, they only add up the True value with means if it is False It will carry over the previous value ( They are in the same group) – BENY Feb 16 '18 at 23:15
0

pd.cut and groupby is exactly what you are looking for -

import numpy as np
import pandas as pd
x = pd.DataFrame([1, 2, 3, 0, -1, -2, -3, 1, 2],columns=['Dailychange'])
col = x['Dailychange']
x['Labels'] = list(pd.cut(x['Dailychange'],[-float("inf"),-0.1,0.1,float("inf")],labels=['neg','neutral','pos']))
# for i,e in enumerate(x['Labels']):
#     print(col[i],x['Labels'][i])
x['chunk_number'] = (x['Labels'] != x['Labels'].shift()).cumsum()
grouped_df = x.groupby('chunk_number')
for i in grouped_df.groups.keys():
    print(list(grouped_df.get_group(i)['Dailychange']))

Also Checkout: Documentation | Related question | Another Related question

Udayraj Deshmukh
  • 1,814
  • 1
  • 14
  • 22