1

Given a dataset as follows:

    name     month  year
0    Joe  December  2017
1  James   January  2018
2    Bob     April  2018
3    Joe  December  2017
4   Jack  February  2018
5   Jack     April  2018

I need to filter and display all duplicated rows based on columns month and year in Pandas.

With code below, I get:

df = df[df.duplicated(subset = ['month', 'year'])]
df = df.sort_values(by=['name', 'month', 'year'], ascending = False)

Out:

   name     month  year
3   Joe  December  2017
5  Jack     April  2018

But I want the result as follows:

    name     month  year
0    Joe  December  2017
1    Joe  December  2017
2    Bob     April  2018
3   Jack     April  2018

How could I do that in Pandas?

ah bon
  • 9,293
  • 12
  • 65
  • 148
  • 1
    Does this answer your question? [How do I get a list of all the duplicate items using pandas in python?](https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python) You opened a new question and you answered to it without checking if it already exists? – Riccardo Bucco Aug 11 '20 at 13:14
  • It seems not check duplicated based on multiple columns – ah bon Aug 11 '20 at 13:15
  • It's exactly the same... https://stackoverflow.com/a/54050797/5296106 – Riccardo Bucco Aug 11 '20 at 13:16

1 Answers1

1

The following code works, by adding keep = False:

df = df[df.duplicated(subset = ['month', 'year'], keep = False)]
df = df.sort_values(by=['name', 'month', 'year'], ascending = False)
ah bon
  • 9,293
  • 12
  • 65
  • 148