Window Rolling Aggregation for categorical Data

Asked Oct 16 '18 at 19:43

Active Oct 16 '18 at 20:03

Viewed 207 times

I have one column Dataframe of size 5 milliom rows. I want reduce it to 25k rows by aggregating each 200 rows into one (25k x 200 = 5 000 000 ). This row value should take to class label that is most frequent in all 200 rows.

Example :

import pandas as pd

df = pd.DataFrame({'a' :['s','s','t','s','s','t','s','t','t','w','w','t','w','s','d']})
print(df)

Out[60]: 
     a
0   s
1   s
2   t
3   s
4   s
5   t
6   s
7   t
8   t
9   w
10  w
11  t
12  w
13  s
14  w

I want to do something like this (an example) :

my_rolling_apply(my_column , widow_size= 3, function= majority_voted_class)

To get as output :

Out[2]: 
   a
0  s
1  s
2  t
3  w
4  w

The question is how can do this ? is there any function that can handle this task ?

Update :

The only issue here is that I need to control the size of the groups. And the grouping should output equal sized group to assign the most common label in each group.

edited Oct 16 '18 at 20:03

asked Oct 16 '18 at 19:43

smerllo

3,117
1
22
37

What have you tried so far? Where exactly is the problem? – Ralf Oct 16 '18 at 19:45
I could not find the appropriate function for this : – smerllo Oct 16 '18 at 19:46
I know about functions like pd.rolling_apply but it seem not what I am looking for – smerllo Oct 16 '18 at 19:47
1

From [this question](https://stackoverflow.com/questions/15222754/group-by-pandas-dataframe-and-select-most-common-string-factor) you may be able to use something like `df.groupby('a').agg(lambda x:x.value_counts().index[0])` – G. Anderson Oct 16 '18 at 20:00
Correct. The only issue is that I need to control the size of the groups. The grouping should output equal sized group to assign the most common label in each group. – smerllo Oct 16 '18 at 20:02

Window Rolling Aggregation for categorical Data

0 Answers0