0

I'm looking for a more automated approach to subset this dataframe by rank and put them in a list. Because if there happens to be 150 ranks I can't do individual subsets.

ID    |  GROUP   |  RANK
1     |    A     |    1
2     |    B     |    2
3     |    C     |    3
2     |    A     |    1
2     |    E     |    2
2     |    G     |    3

How can I subset the dataframe by Rank and then put every subset in a list? (Not using group by) I know how to individually subset them but I'm not sure how I can do this if there's more ranks.

Output:

ranks = [df1,df2,df3....and so on]
its.kcl
  • 123
  • 7

1 Answers1

1

Just use groupby directly in a list comprehension

>>> [df for rank, df in df.groupby('RANK')]

This will generate a list of dataframes, each a sub-dataframe related to the corresponding rank.

You can also do a dict comprehension:

>>> dic = {rank: df for rank, df in df.groupby('RANK')}

such that you can access your df via dic[1] for rank == 1.


In more detail, pd.DataFrame.groupby is a method that returns a DataFrameGroupBy object. A DataFrameGroupBy object is an iterable, which means you can iterate over it with a for loop. This iterable generates tuples with two vales, where the first is whatever you used to group (in this case, an integer rank), and the second, the sub dataframe.

rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • sorry, can you expand this more? I don't understand – its.kcl Jun 03 '22 at 15:33
  • @its.kcl `groupby` returns a list of tuples, where the first item of each tuple is a unique value of `RANK` (in this case), and the second item of each tuple is a subset of the dataframe where `RANK == ` –  Jun 03 '22 at 15:34
  • `[df for rank, df in df.groupby('RANK')]` just takes each dataframe from each tuple and puts them all in a list together. –  Jun 03 '22 at 15:35
  • For the dic approach say I want to access it every dataframe rank one by one that is possible right? – its.kcl Jun 03 '22 at 15:35
  • @its.kcl yes, it's possible. Play with it and you'll get the intuition very fast ;) – rafaelc Jun 03 '22 at 15:38
  • So what I'm trying to do here is I have an algorithm that runs for df1 and the output of that have updates for df2 and then run df2 and so on...which is why I'm asking if it's possible to sort of iterate through this in groups of df? – its.kcl Jun 03 '22 at 15:39
  • You don't _have_ to [create a list from the groupby](/a/50866760/15497888). You can just use it as a loop (See [this answer](/a/27422749/15497888) for example) – Henry Ecker Jun 03 '22 at 15:43