1

So I have the following dataframe, but with a valuable amount of rows(100, 1000, etc.):

# Person1 Person2 Age
1 Alex Maria 20
2 Paul Peter 20
3 Klaus Hans 30
4 Victor Otto 30
5 Gerry Justin 30

Problem:

Now I want to print separate dataframes, which contain all people, that visit the same age, so the output should look like this:

DF1:

# Person1 Person2 Age
1 ALex Maria 20
2 Paul Peter 20

DF2:

# Person1 Person2 Age
3 Klaus Hans 30
4 Victor Otto 30
5 Gerry Justin 30

I've tried this with the following functions:

Try1:

def groupAge(data):
    x = -1
    for x in range(len(data)):
        #q = len(data[data["Age"] == data.loc[x, "Age"]])

        b = data[data["Age"] == data.loc[x,"Age"]]
        x = x + 1
        print(b,x)


    return  b

Try2:

def groupAge(data):
    x = 0

    for x in range(len(data)):

        q = len(data[data["Age"] == data.loc[x, "Age"]])
        x = x + 1

        for k in range(0,q,q):
            b = data[data["Age"] == data.loc[k,"Age"]]
            print(b)

        return  b

Neither of them produced the right output. Try1 prints a few groups, and all of them twice, but doesn't go through the entire dataframe and Try2 only prints the first Age "group", also twice.

I can't identify firstly why it always prints the output two times, neither why it doesn't work through the entire dataframe.

Can anyone help?

  • You cannot iterate through a dataframe in the way you mentioned here. use df.iterrows() or df.apply() – sharathnatraj Jan 10 '21 at 01:46
  • 1
    create a dict of dataframes using `groupby`: `df_dict = {age: d for age, d in df.groupby('age')}`, access each `dict` with `'age'` value as the `key`, `df_dict[30]`. – Trenton McKinney Jan 10 '21 at 02:12
  • 1
    Use `dfs = dict(tuple(df.groupby('Age')))` to create a dictionary of dataframes. Then access your dataframes using age as the key like `dfs[20]` and `dfs[30]` – Scott Boston Jan 10 '21 at 02:16
  • Very much like @TrentonMcKinney method he is using dictionary comprehension, where I am using the python buildin methods. – Scott Boston Jan 10 '21 at 02:17

1 Answers1

1

In your first try, you are looping through the length of dataframe and then repeating the below line every time replacing x with 0,1,2,3 and 4, respectively. On a side note, x = x + 1 is not required. range already takes care of that.

b = data[data["Age"] == data.loc[x,"Age"]]

It will obviously print them twice every time because you are scanning through the entire dataframe data and executing duplicate commands. For example:

print(data.loc[0, 'Age'])
print(data.loc[1, 'Age']) 
20
20

Both the above statements print 20, so by substituting 20 in the loop, essentially you will be executing the following commands twice.

b = data[data["Age"] == 20]

I think all you need is this,

unq_age = data['Age'].unique()
df1 = df.loc[df['Age'] == unq_age[0]]
df2 = df.loc[df['Age'] == unq_age[1]]

df1
       # Person1 Person2  Age
0  1    Alex   Maria   20
1  2    Paul   Peter   20

df2
    #   Person1 Person2 Age
2   3   Klaus   Hans    30
3   4   Victor  Otto    30
4   5   Gerry   Justin  30
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
sharathnatraj
  • 1,614
  • 5
  • 14