divide a dataframe on special threshold

Question

I got a DataFrame as an example:

name  age
Ashe   12
Ashe   13
Ashe   23
John   33
John   45
Karin  55
David  84
Zaki   34
Mano   45

my threshold is I need to divide this on distinct names like I need 3 distinct names so I need the output to be :

name  age
Ashe   12
Ashe   13
Ashe   23
John   33
John   45
Karin  55

and the second DF :

name  age
David  84
Zaki   34
Zaki   23
Zaki   35
Mano   45

what can I do?

Does this answer your question? [subsetting a Python DataFrame](https://stackoverflow.com/questions/19237878/subsetting-a-python-dataframe) — azro, Mar 24 '20 at 12:55
no its not ,, cuz their he has a query on age , here i have it on name only i need like 3 distinct names even if each name repeated 5 times — Ashraf Khaled, Mar 24 '20 at 13:07

etrnote · Accepted Answer · 2020-03-24T14:15:55.987

from itertools import islice

def chunk(lst, size):
    lst = iter(lst)
    return iter(lambda: tuple(islice(lst, size)), ())

name_groups = list(chunk(df.name.unique(),3))
data = {}
for i, group in enumerate(name_groups):
    data[f'df{i}'] = df[df.name.isin(group)]

The chunk function splits an array to chunks of size n (in our case - 3)
You can read more here : https://stackoverflow.com/a/22045226/13104290

name_groups contains a list of tuples with up to 3 elements each one:
[('Ashe', 'John', 'Karin'), ('David', 'Zaki', 'Mano')]

Since we sent df.name.unique(), there are no duplications.

Now we need to dynamically create each new dataframe, we'll do this by creating a dictionary and adding each new partition one at a time.

The dictionary now contains two dataframes, df0 and df1.

data['df0'] :

    name    age
0   Ashe    12
1   Ashe    13
2   Ashe    23
3   John    33
4   John    45
5   Karin   55

data['df1']:

    name    age
6   David   84
7   Zaki    34
8   Mano    45

divide a dataframe on special threshold

1 Answers1