Pandas Groupby Select Groups that Have More Than One Unique Values in a Column

Question

I have a dataframe of some information about some artists, their albums, and their tracks.

df = pd.DataFrame({'Artist': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'E'], 'AlbumId': [201, 201, 451, 390, 390, 272, 272, 698, 698, 235, 312], 'TrackId': [1022, 3472, 9866, 6078, 2634, 3411, 8673, 2543, 5837, 9874, 1089]})

The Dataframe

Artist A has 2 albums(201 and 451), with one having 2 tracks(1022 and 3472) and 1 having 1 track(9866).

Artist B has 1 album(390) with 2 tracks(6078 and 2634).

Artist C has 2 albums(272 and 698), with each album having 2 tracks.

Artist D has 1 album(235) with 1 track(9874).

Artist E has 1 album(312) with 1 track(1089).

I want to find the artists who have more than 1 album, and get the rows of these artists accordingly. My desired output looks like this:

Desired Output

I have tried:

groupedArtists = data.groupby(['ArtistId', 'AlbumId']).filter(lambda group: (group.AlbumId.nunique() > 1))

But it seems not to work as expected.

Could someone please help me out? I appreciate it!

Welcome to Stack Overflow. It will be easier to assist you if you provide the data in a way that can be copied and pasted directly into a program, such as `df.head().to_dict('list')`. [This guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) has a lot of good advice for getting answers for Pandas questions. — Steele Farnsworth, Dec 05 '21 at 16:17

score 6 · Answer 1 · answered Dec 05 '21 at 16:28

You want to group by only ArtistId, and not AlbumId:

groupedArtists = data.groupby(['Artist']).filter(lambda x: x['AlbumId'].nunique() > 1)

Output:

>>> groupedArtists
  Artist  AlbumId  TrackId
0      A      201     1022
1      A      201     3472
2      A      451     9866
5      C      375     1022
6      C      412     9866
7      C      375     3472
...

Valdi_Bo · Answer 2 · 2021-12-05T16:35:22.340

3

Grouping should be solely by Artist.

Then, for each group, check how many (different) albums it contains and take only groups having more than 1 album.

So the proper solution is:

data.groupby('Artist').filter(lambda grp: grp.AlbumId.nunique() > 1)

edited Dec 05 '21 at 16:35

answered Dec 05 '21 at 16:30

Valdi_Bo

30,023
4
23
41

score 0 · Answer 3 · answered Dec 05 '21 at 16:40

0

This is the solution I found, which is a little more verbose, but perhaps more easily understandable:

counted = df.groupby(['Artist']).size().reset_index(name='counts')
df[df['Artist'].isin(counted[counted.counts > 1].Artist)]

answered Dec 05 '21 at 16:40

Shay Nehmad

1,103
1
12
25

score 0 · Answer 4 · answered Jun 20 '23 at 21:22

A late response, but an alternative to the other solutions is the following:

groupedArtists = data[data.groupby('Artist')['AlbumId'].transform('nunique').ne(1)]

This groups by Artist, like the others, but instead of applying a function for filtering, returns a pd.Series of number of unique Albums for each group, and returns only those rows where nunique is not equal to 1.

In a comparison in time to run, this is a bit faster:

# my solution
%timeit groupedArtists = data[data.groupby('Artist')['AlbumId'].transform('nunique').ne(1)]
1.08 ms ± 85.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# solution by @Valdi_Bo and @user17242583
%timeit groupedArtists = data.groupby(['Artist']).filter(lambda x: x['AlbumId'].nunique() > 1)
1.81 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# solution by @Shay Nehmad
%%timeit
counted = df.groupby(['Artist']).size().reset_index(name='counts')
groupedArtists df[df['Artist'].isin(counted[counted.counts > 1].Artist)]
2.03 ms ± 32.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

score -1 · Answer 5 · answered Dec 05 '21 at 17:09

-1

you can create a aggregated DataFrame based on counts of Albums and then filter on the number of albums you want

stats = df.groupby(['Artist'])['AlbumId'].count().reset_index()
morethan1 = stats.loc[stats['AlbumId'] >1]

answered Dec 05 '21 at 17:09

kwa

49
4

Pandas Groupby Select Groups that Have More Than One Unique Values in a Column

5 Answers5

Linked

Related