1

I have an undirected network of connections in a dataframe.

    Source_ID    Target_ID
0       1            5
1       7            2     
2       12           6
3       3            9
4       16           11
5       2            7 <------The same as row 1
6       4            8
7       5            1 <------The same as row 0
8       99           81

But since this is an undirected network, row 0 and row 7 are technically the same, as are row 1 and row 5. df.drop_duplicates() isn't smart enough to know how to eliminate these as duplicates, as it see them as two distinct rows, at least as far as my attempts have yielded.

I also tried what I thought should work, which is using the index of Source_ID and Target_ID and setting Source_ID to be "lower" than target_ID. But that didn't seem to produce the results I needed either.

df.drop(df.loc[df['Target_ID'] < d['Source_ID']]
        .index.tolist(), inplace=True)

Therefore, I need to figure out a way to drop the duplicate connections (while keeping the first) such that my fixed dataframe looks like (after an index reset):

    Source_ID    Target_ID
0       1            5
1       7            2     
2       12           6
3       3            9
4       16           11
5       4            8
6       99           81
DrakeMurdoch
  • 765
  • 11
  • 26
  • 1
    Related to [this question](https://stackoverflow.com/questions/58592606/find-symmetric-pairs-quickly-in-numpy/58592764#58592764) – Quang Hoang Nov 28 '19 at 03:38
  • convert the dataframe to frozenset and then to dataframe. https://docs.python.org/3/library/stdtypes.html#frozenset – Rajat Mishra Nov 28 '19 at 04:29

1 Answers1

0

Certainly not the most efficient, but might do the job:

df.apply(lambda row: pd.Series() if row[::-1].values in df.values \
         and row[0] < row[1] else row, axis=1).dropna().reset_index(drop=True)
Haliaetus
  • 441
  • 3
  • 11