how to select all partially duplicated rows in a table with pandas?

Asked Jun 27 '23 at 11:30

Active Jun 27 '23 at 11:55

Viewed 21 times

This is a simple example of the problem in hand, if I have the following table

df1
  Column1  Column2 Column3
0     cat        a       1
1     dog        b       4
2     cat        b       2
3     bird       a       3
4     cat        a       2
5     dog        b       3

I want to get all the rows that are duplicated regarding Column1 and Column2.

my take on that was as follows:

df1[df1['Column1'].isin(df1[df1[['Column1','Column2']].duplecated()]['Column1]) & df1['Column2'].isin(df1[df1[['Column1','Column2']].duplecated()]['Column2])]

Which have an output of

  Column1  Column2 Column3
0     cat        a       1
1     dog        b       4
2     cat        b       2
4     cat        a       2
5     dog        b       3

While the desired output should be

  Column1  Column2 Column3
0     cat        a       1
1     dog        b       4
4     cat        a       2
5     dog        b       3

edited Jun 27 '23 at 11:55

asked Jun 27 '23 at 11:30

Amr AlBarqawy

`df1.merge(df2)` – mozway Jun 27 '23 at 11:31
Regarding your update: `df1[df1.duplicated(subset=['Column1', 'Column2'], keep=False)]` – mozway Jun 27 '23 at 11:46

how to select all partially duplicated rows in a table with pandas?

0 Answers0