I have a dataframe with some primary key idx
, a binary column cat
and a column with values y
. See example data below.
The binary column can be seen as a category, the first couple of rows with 0 are category 1, the following couple of rows with 1 are category 2, the next couple of rows with 0 are category 3 and so forth... where each category has a different number of rows. So there's 5 categories in this example dataframe split up by consecutive binaries, the actual dataframe consists of many more categories with many more rows per category.
I'd like to make a list of dataframes where each dataframe corresponds to a category. So this example data would be split up in a list of 5 dataframes.
What's the best way to do this with a large dataframe?
import pandas as pd
import numpy as np
import random
idx = [i for i in range(55)]
cat = [0,0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,1,1,1,1,1,
0,0,0,0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,1,1,1,1,1,
0,0,0,0,0,0,0,0,0,0,0]
y = [random.random()*10 for _ in range(55)]
df = pd.DataFrame({'idx':idx, 'cat':cat, 'y':y})