Python Pandas GroupBy get list of groups

Question

I have a line of code:

g = x.groupby('Color')

The colors are Red, Blue, Green, Yellow, Purple, Orange, and Black. How do I return this list? For similar attributes, I use x.Attribute and it works fine, but x.Color doesn't behave the same way.

You can get the unique values from your orig df, no need to group `x['Color'].unique()` — EdChum, Mar 04 '15 at 08:50
The x['Color'].unique ended up being exactly what I was looking for. Thank you. — user3745115, Mar 05 '15 at 02:34

score 118 · Accepted Answer · edited Nov 21 '16 at 13:55

118

There is much easier way of doing it:

g = x.groupby('Color')

g.groups.keys()

By doing groupby() pandas returns you a dict of grouped DFs. You can easily get the key list of this dict by python built in function keys().

edited Nov 21 '16 at 13:55

DavidG

24,279
14
89
82

answered Nov 21 '16 at 13:23

Yanqi Ma

1,204
1
9
2

7

This is much more `pandorable` than other answers. :) – Peaceful Dec 23 '16 at 06:18
1

Please look at Erik Swan's answer below before you make a decision on which method to use. If consistent ordering of group names is an issue, go for Erik's way. – rocarvaj Sep 22 '18 at 16:19
9

`groupby()` does not return a `dict`, but a `DataFrameGroupBy` object. – HelloGoodbye Jan 30 '19 at 10:24
2

In Python3.x the above code will throw a TypeError and `list(g.groups)` would be preferred, see also [the accepted answer in this question](https://stackoverflow.com/questions/18552001/accessing-dict-keys-element-by-index-in-python3) – Adriaan Apr 10 '20 at 10:05
@Adriaan I get no errors when running this on Python 3.10.1, maybe an update changed that? – Lfppfs Dec 21 '21 at 13:59
Perhaps, as I udnerstood from the comments in the question+answer I referred to it does not throw an error all the time. The list()-based solution is still preferred. – Adriaan Mar 23 '22 at 13:16

Erik Swan · Answer 2 · 2022-08-05T18:36:31.867

39

If you do not care about the order of the groups, Yanqi Ma's answer will work fine:

g = x.groupby('Color')
g.groups.keys()
list(g.groups) # or this

However, note that g.groups is a dictionary, so in Python <3.7 the keys are inherently unordered! This is the case even if you use sort=True on the groupby method to sort the groups, which is true by default.

This actually bit me hard when it resulted in a different order on two platforms, especially since I was using list(g.groups), so it wasn't obvious at first that g.groups was a dict.

In my opinion, the best way to do this is to take advantage of the fact that the GroupBy object has an iterator, and use a list comprehension to return the groups in the order they exist in the GroupBy object:

g = x.groupby('Color')
groups = [name for name,unused_df in g]

It's a little less readable, but this will always return the groups in the correct order.

edited Aug 05 '22 at 18:36

answered May 23 '17 at 00:30

Erik Swan

575
4
11

just wondering how could I know attributes of GroupBy object? because as a premise, i think name should be one of attributes. However, I could not find relevant information in pandas document. – s666 Jun 11 '20 at 07:51
All of the methods and attributes of the GroupBy object are documented in [the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/groupby.html). – Erik Swan Dec 22 '20 at 09:51
1

The above concerns hold for Python versions prior to 3.7. For newer Python versions, dictionary keys _are_ [(insertion) ordered](https://stackoverflow.com/questions/39980323). I expect that `list(g.groups)==[name for name,_ in g]` is True, regardless of whether `sort=True` or `sort=False`. – normanius Jun 05 '21 at 00:59
1

Although the Pandas documentation doesn't explicitly state that, I agree that is probably true. Good to know this type of mistake is harder to make in Python 3.7+. – Erik Swan Jun 17 '21 at 02:37

score 7 · Answer 3 · answered Mar 04 '15 at 00:52

7

Here's how to do it.

groups = list()
for g, data in x.groupby('Color'):
    print(g, data)
    groups.append(g)

The core idea here is this: if you iterate over a dataframe groupby iterator, you'll get back a two-tuple of (group name, filtered data frame), where filtered data frame contains only records corresponding to that group).

answered Mar 04 '15 at 00:52

ericmjl

13,541
12
51
80

2

Alternatively, if you want to get the unique values present in each column, you can do `numpy.unique(x[col_name].values)` – ericmjl Mar 04 '15 at 00:53

Zythyr · Answer 4 · 2015-07-04T01:36:32.363

It is my understanding that you have a Data Frame which contains multiples columns. One of the columns is "Color" which has different types of colors. You want to return a list of unique colors that exist.

colorGroups = df.groupby(['Color'])
for c in colorGroups.groups: 
    print c

The above code will give you all the colors that exist without repeating the colors names. Thus, you should get an output such as:

Red
Blue
Green
Yellow
Purple
Orange
Black

An alternative is the unique() function which returns an array of all unique values in a Series. Thus to get an array of all unique colors, you would do:

df['Color'].unique()

The output is an array, so for example print df['Color'].unique()[3] would give you Yellow.

Itai Roth · Answer 5 · 2020-04-30T06:22:43.563

5

I compared runtime for the solutions above (with my data):

In [443]: d = df3.groupby("IND")

In [444]: %timeit groups = [name for name,unused_df in d]
377 ms ± 27.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [445]: % timeit  list(d.groups)
1.08 µs ± 47.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [446]: % timeit d.groups.keys()
708 ns ± 7.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [447]: % timeit df3['IND'].unique()
5.33 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

it seems that the 'd.groups.keys()' is the best method.

edited Apr 30 '20 at 06:22

answered Apr 28 '20 at 09:06

Itai Roth

51
1
3

Please post the entire used command and your results, if you want to write an answer that is actually contributing. Otherwise use the `comment` option. – Py-ser Apr 28 '20 at 14:00
It's not that simple, runtime will depend on the structure of your data. In my case - a df with few groups but many members per group - I found the exact opposite result: the list comprehension was fastest (22 ms), while `df.groupby(..).groups.keys()` was slower: 124ms. – Marses Nov 10 '21 at 10:31
Note: in my experiment, the first time I run `d.groups.keys()`, it is much slower (again 100-300 ms), but the second time it is 4ms. So your results may only depend on the order you do the timing in. – Marses Nov 10 '21 at 10:35

score 0 · Answer 6 · answered Jul 27 '20 at 22:28

0

Hope this helps.. Happy Coding :)

df = pd.DataFrame(data=[['red','1','1.5'],['blue','20','2.5'],['red','15','4']],columns=(['color','column1','column2']))

list_req = list(df.groupby('color').groups.keys())
print(list_req)

answered Jul 27 '20 at 22:28

Induraj PR

284
2
9

Python Pandas GroupBy get list of groups

6 Answers6

Linked