3

I am looking for a an efficient and elegant way in Pandas to remove "duplicate" rows in a DataFrame that have exactly the same value set but in different columns.

I am ideally looking for a vectorized way to do this as I can already identify very inefficient ways using the Pandas pandas.DataFrame.iterrows() method.

Say my DataFrame is:

source|target|
----------------
| 1   |  2   |
| 2   |  1   |
| 4   |  3   |
| 2   |  7   |
| 3   |  4   |

I want it to become:

source|target|
----------------
| 1   |  2   |
| 4   |  3   |
| 2   |  7   |
rafaelc
  • 57,686
  • 15
  • 58
  • 82
Noelmas
  • 83
  • 1
  • 5
  • This is a duplicate, many questions ask about this. Take look maybe https://stackoverflow.com/questions/51603520/pandas-remove-duplicates-that-exist-in-any-order – rafaelc Apr 02 '19 at 17:30
  • This is a duplicate indeed. The link RafaelC provided lies your answer. Your solution is here: `pd.DataFrame(np.sort(df.values, axis=1), columns=df.columns).drop_duplicates()` – Erfan Apr 02 '19 at 17:32
  • Many thanks, sorry for not spotting this – Noelmas Apr 02 '19 at 17:33
  • 1
    Possible duplicate of [Sorting df rows horizontally](https://stackoverflow.com/questions/38884131/sorting-df-rows-horizontally) – Erfan Apr 02 '19 at 17:33

1 Answers1

2
df = df[~pd.DataFrame(np.sort(df.values,axis=1)).duplicated()]

    source  target
0   1   2
2   4   3
3   2   7

explanation:

np.sort(df.values,axis=1) is sorting DataFrame column wise

array([[1, 2],
       [1, 2],
       [3, 4],
       [2, 7],
       [3, 4]], dtype=int64)

then making a dataframe from it and checking non duplicated using prefix ~ on duplicated

~pd.DataFrame(np.sort(df.values,axis=1)).duplicated()

0     True
1    False
2     True
3     True
4    False
dtype: bool

and using this as mask getting final output

    source  target
0   1   2
2   4   3
3   2   7
Akhilesh_IN
  • 1,217
  • 1
  • 13
  • 19
  • 1
    Hi Akhilesh, while this may be the correct answer, you should be leaving a little insight/ explanation of what you have done here to make it a quality answer which will help others understand the root cause of problem. – nircraft Apr 02 '19 at 20:19
  • @nircraft thank you for pointing it out. please check updates – Akhilesh_IN Apr 03 '19 at 03:18