Create multiple DataFrames from one pandas DataFrame by grouping by column values

Question

So I have the following dataframe, but with a valuable amount of rows(100, 1000, etc.):

#	Person1	Person2	Age
1	Alex	Maria	20
2	Paul	Peter	20
3	Klaus	Hans	30
4	Victor	Otto	30
5	Gerry	Justin	30

Problem:

Now I want to print separate dataframes, which contain all people, that visit the same age, so the output should look like this:

DF1:

#	Person1	Person2	Age
1	ALex	Maria	20
2	Paul	Peter	20

DF2:

#	Person1	Person2	Age
3	Klaus	Hans	30
4	Victor	Otto	30
5	Gerry	Justin	30

I've tried this with the following functions:

Try1:

def groupAge(data):
    x = -1
    for x in range(len(data)):
        #q = len(data[data["Age"] == data.loc[x, "Age"]])

        b = data[data["Age"] == data.loc[x,"Age"]]
        x = x + 1
        print(b,x)


    return  b

Try2:

def groupAge(data):
    x = 0

    for x in range(len(data)):

        q = len(data[data["Age"] == data.loc[x, "Age"]])
        x = x + 1

        for k in range(0,q,q):
            b = data[data["Age"] == data.loc[k,"Age"]]
            print(b)

        return  b

Neither of them produced the right output. Try1 prints a few groups, and all of them twice, but doesn't go through the entire dataframe and Try2 only prints the first Age "group", also twice.

I can't identify firstly why it always prints the output two times, neither why it doesn't work through the entire dataframe.

Can anyone help?

You cannot iterate through a dataframe in the way you mentioned here. use df.iterrows() or df.apply() — sharathnatraj, Jan 10 '21 at 01:46
create a dict of dataframes using `groupby`: `df_dict = {age: d for age, d in df.groupby('age')}`, access each `dict` with `'age'` value as the `key`, `df_dict[30]`. — Trenton McKinney, Jan 10 '21 at 02:12
Use `dfs = dict(tuple(df.groupby('Age')))` to create a dictionary of dataframes. Then access your dataframes using age as the key like `dfs[20]` and `dfs[30]` — Scott Boston, Jan 10 '21 at 02:16
Very much like @TrentonMcKinney method he is using dictionary comprehension, where I am using the python buildin methods. — Scott Boston, Jan 10 '21 at 02:17

score 1 · Accepted Answer · edited Jan 10 '21 at 21:06

In your first try, you are looping through the length of dataframe and then repeating the below line every time replacing x with 0,1,2,3 and 4, respectively. On a side note, x = x + 1 is not required. range already takes care of that.

b = data[data["Age"] == data.loc[x,"Age"]]

It will obviously print them twice every time because you are scanning through the entire dataframe data and executing duplicate commands. For example:

print(data.loc[0, 'Age'])
print(data.loc[1, 'Age']) 
20
20

Both the above statements print 20, so by substituting 20 in the loop, essentially you will be executing the following commands twice.

b = data[data["Age"] == 20]

I think all you need is this,

unq_age = data['Age'].unique()
df1 = df.loc[df['Age'] == unq_age[0]]
df2 = df.loc[df['Age'] == unq_age[1]]

df1
       # Person1 Person2  Age
0  1    Alex   Maria   20
1  2    Paul   Peter   20

df2
    #   Person1 Person2 Age
2   3   Klaus   Hans    30
3   4   Victor  Otto    30
4   5   Gerry   Justin  30

Create multiple DataFrames from one pandas DataFrame by grouping by column values

1 Answers1