1

I have the following pandas dataframe :

   a
0  0
1  0
2  1
3  2
4  2
5  2
6  3
7  2
8  2
9  1

I want to store the values in another dataframe such as every group of consecutive indentical values make a labeled group like this :

   A  B
0  0  2
1  1  1
2  2  3
3  3  1
4  2  2
5  1  1

The column A represent the value of the group and B represents the number of occurences.

this is what i've done so far:

df = pd.DataFrame({'a':[0,0,1,2,2,2,3,2,2,1]})
df2 = pd.DataFrame()
for i,g in df.groupby([(df.a != df.a.shift()).cumsum()]):
    vc = g.a.value_counts()
    df2 = df2.append({'A':vc.index[0], 'B': vc.iloc[0]}, ignore_index=True).astype(int)

It works but it's a bit messy.

Do you think of a shortest/better way of doing this ?

lnper00
  • 43
  • 2
  • https://stackoverflow.com/questions/40802800/how-to-groupby-consecutive-values-in-pandas-dataframe – ALollz Nov 08 '19 at 18:06

2 Answers2

1

I would try:

df['blocks'] = df['a'].ne(df['a'].shift()).cumsum()
(df.groupby(['a','blocks'],
           as_index=False,
           sort=False)
   .count()
   .drop('blocks', axis=1)
)

Output:

   a  B
0  0  2
1  1  1
2  2  3
3  3  1
4  2  2
5  1  1
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Last `1 1` is missing. `drop_duplicate` dropping `1 1` becuase its already seen once at index `2` – Poojan Nov 08 '19 at 18:09
1

use GrouBy.agg in Pandas >0.25.0:

new_df= ( df.groupby(df['a'].ne(df['a'].shift()).cumsum(),as_index=False)
            .agg(A=('a','first'),B=('a','count')) )

print(new_df)

   A  B
0  0  2
1  1  1
2  2  3
3  3  1
4  2  2
5  1  1

pandas <0.25.0

new_df= ( df.groupby(df['a'].ne(df['a'].shift()).cumsum(),as_index=False)
            .a
            .agg({'A':'first','B':'count'}) )
ansev
  • 30,322
  • 5
  • 17
  • 31