0

so I have a lot of rows in my pandas dataframe and I want to make a new dataframe containing all the rows for each differnet ID present in my dataframe.

this is the basic layout of the data that i have

ID name score
1 a Three
1 b Three
2 c Three
1 d Three
3 e Three
5 f Three

and this is what i am trying to get

ID name score
1 a Three
1 b Three
1 d Three

and so on for each and every ID. Please help.

AMAN MALIK
  • 15
  • 1
  • 5
  • So you want to have a list of dataframes? –  Feb 02 '22 at 16:14
  • @richardec yeah i want to create a new dataframe for each and every id in the data but because there are more than 2000 different ID's and they are random and not in a sequence i am unable to find a way to do anything. – AMAN MALIK Feb 02 '22 at 16:18
  • `groupby` is the way to go. Check my answer. –  Feb 02 '22 at 16:18

1 Answers1

1

Just use list comprehension with groupby:

dataframes = [d for _, d in df.groupby('ID')]

Output:

>>> dataframes
[   ID name  score
 0   1    a  Three
 1   1    b  Three
 3   1    d  Three,
    ID name  score
 2   2    c  Three,
    ID name  score
 4   3    e  Three,
    ID name  score
 5   5    f  Three]
 
 
>>> dataframes[0]
   ID name  score
0   1    a  Three
1   1    b  Three
3   1    d  Three

>>> dataframes[1]
   ID name  score
2   2    c  Three
  • thank you so much. I should learn groupby by heart. But why is there a _ , does that mean something? – AMAN MALIK Feb 02 '22 at 16:32
  • `df.groupby('ID')` basically returns a list of tuples, where each tuple has two items: the first the is thing unique value for that group, the second is the dataframe iteself. `_` is just a dummy variable name that's often used to indicate the variable is not used. –  Feb 02 '22 at 16:34