1

here is data example:

import pandas as pd
df = pd.DataFrame({
    'file': ['file1','file2','file1','file2','file3','file3','file4','file5','file4','file5'],
    'prop1': ['True','False','True','False','False','False','False','True','False','False'],
    'prop2': ['False','False','False','False','True','False','True','False','True','False'],
    'prop3': ['False','True','False','True','False','True','False','False','False','True']
})

file    prop1   prop2   prop3
0   file1   True    False   False
1   file2   False   False   True
2   file1   True    False   False
3   file2   False   False   True
4   file3   False   True    False
5   file3   False   False   True
6   file4   False   True    False
7   file5   True    False   False
8   file4   False   True    False
9   file5   False   False   True

I need to drop duplicated rows with same props values to another dataframe and cut them off original file.
So another dataframe should looks like this (duplicated rows should not repeat):

file    prop1   prop2   prop3
0   file1   True    False   False
3   file2   False   False   True
8   file4   False   True    False

df = df.drop_duplicates() drop onlu 1 duplicated row, but not second like this:

    file    prop1   prop2   prop3
0   file1   True    False   False
1   file2   False   False   True
4   file3   False   True    False
5   file3   False   False   True
6   file4   False   True    False
7   file5   True    False   False
9   file5   False   False   True
Contra111
  • 325
  • 2
  • 10

2 Answers2

1
uniques = df.drop_duplicates()
duplicates = df.iloc[list(set(df.index) - set(uniques.index))]

You can use the pandas method drop_duplicates() first to create a dataframe with only the unique rows. You can then compare the indices of your original dataframe and the indices in the frame with unique rows, the 'dropped' indices are your duplicate rows, which you can copy again from your original dataframe in order to now have your unique rows and duplicated rows seperated.

1

Use DataFrame.drop_duplicates with specify columns names by selecting - all columns without first:

df = df.drop_duplicates(df.columns[1:])

Or seelct columns with prop in columns names:

df = df.drop_duplicates(df.filter(like='prop').columns)

print (df)
    file  prop1  prop2  prop3
0  file1   True  False  False
1  file2  False  False   True
4  file3  False   True  False
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252