Filter and display all duplicated rows based on multiple columns in Pandas

Question

Given a dataset as follows:

    name     month  year
0    Joe  December  2017
1  James   January  2018
2    Bob     April  2018
3    Joe  December  2017
4   Jack  February  2018
5   Jack     April  2018

I need to filter and display all duplicated rows based on columns month and year in Pandas.

With code below, I get:

df = df[df.duplicated(subset = ['month', 'year'])]
df = df.sort_values(by=['name', 'month', 'year'], ascending = False)

Out:

   name     month  year
3   Joe  December  2017
5  Jack     April  2018

But I want the result as follows:

    name     month  year
0    Joe  December  2017
1    Joe  December  2017
2    Bob     April  2018
3   Jack     April  2018

How could I do that in Pandas?

Does this answer your question? [How do I get a list of all the duplicate items using pandas in python?](https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python) You opened a new question and you answered to it without checking if it already exists? — Riccardo Bucco, Aug 11 '20 at 13:14
It's exactly the same... https://stackoverflow.com/a/54050797/5296106 — Riccardo Bucco, Aug 11 '20 at 13:16

score 1 · Accepted Answer · answered Aug 11 '20 at 13:09

1

The following code works, by adding keep = False:

df = df[df.duplicated(subset = ['month', 'year'], keep = False)]
df = df.sort_values(by=['name', 'month', 'year'], ascending = False)

answered Aug 11 '20 at 13:09

ah bon

9,293
12
65
148

Filter and display all duplicated rows based on multiple columns in Pandas

1 Answers1