I have one column Dataframe of size 5 milliom rows. I want reduce it to 25k rows by aggregating each 200 rows into one (25k x 200 = 5 000 000 ). This row value should take to class label that is most frequent in all 200 rows.
Example :
import pandas as pd
df = pd.DataFrame({'a' :['s','s','t','s','s','t','s','t','t','w','w','t','w','s','d']})
print(df)
Out[60]:
a
0 s
1 s
2 t
3 s
4 s
5 t
6 s
7 t
8 t
9 w
10 w
11 t
12 w
13 s
14 w
I want to do something like this (an example) :
my_rolling_apply(my_column , widow_size= 3, function= majority_voted_class)
To get as output :
Out[2]:
a
0 s
1 s
2 t
3 w
4 w
The question is how can do this ? is there any function that can handle this task ?
Update :
The only issue here is that I need to control the size of the groups. And the grouping should output equal sized group to assign the most common label in each group.