0

I have a dataframe with some primary key idx, a binary column cat and a column with values y. See example data below.

The binary column can be seen as a category, the first couple of rows with 0 are category 1, the following couple of rows with 1 are category 2, the next couple of rows with 0 are category 3 and so forth... where each category has a different number of rows. So there's 5 categories in this example dataframe split up by consecutive binaries, the actual dataframe consists of many more categories with many more rows per category.

I'd like to make a list of dataframes where each dataframe corresponds to a category. So this example data would be split up in a list of 5 dataframes.

What's the best way to do this with a large dataframe?

import pandas as pd
import numpy as np
import random

idx = [i for i in range(55)]
cat = [0,0,0,0,0,0,0,0,0,0,0,
     1,1,1,1,1,1,1,1,1,1,
     0,0,0,0,0,0,0,0,0,0,0,0,0,
     1,1,1,1,1,1,1,1,1,1,
     0,0,0,0,0,0,0,0,0,0,0]
y = [random.random()*10 for _ in range(55)]

df = pd.DataFrame({'idx':idx, 'cat':cat, 'y':y})
Bas R
  • 175
  • 7
  • Welcome to Stack Overflow. This is a [questions and answers site](https://stackoverflow.com/about), not a code-writing service. Please read through [How to Ask](https://stackoverflow.com/help/how-to-ask) and [edit] your question to reflect your work. – ljmc Jan 02 '23 at 14:29
  • This does indeed answer my question, thanks! – Bas R Jan 02 '23 at 15:01

0 Answers0