remove duplicated list in python

Question

I want to remove duplicate lists in my data after I use stack(). This is example

node1 node2  value
A     B      0.05
A     C      0.84
B     A      0.05
B     C      0.97
C     A      0.84
C     B      0.97

into

node1 node2  value
A     B      0.05
A     C      0.84
B     C      0.97

Thanks

Edit: I attach a photo of my data. Row 2 and 174 have same node but different position. I want to keep only one interaction (row2) Ps. the value aren't unique.

Sorry for few explanations. The interactions from A --> B or B --> A are similar and I want to pick only one list. — Maetha, Mar 18 '20 at 08:58
Would C always be the same where node1 and node2 are equal? Also can you post what you have tried yourself? — JvdV, Mar 18 '20 at 08:58

JvdV · Accepted Answer · 2020-03-18T10:35:02.203

Right, So I'm a Python rookie/beginner but tried a few different things with bits and pieces from what I could find:

import pandas as pd
import numpy as np
df = pd.DataFrame({'node1':['A','A','B','B','C','C'],
                   'node2':['B','C','A','C','A','B'],
                   'value':[0.05,0.84,0.05,0.97,0.84,0.97]})
df = df[['value','node1','node2']]                                                         #Step 1: re-arange dataframe to work properly
df['value'] = df['value'].astype(str)                                                      #Step 2: float type (numbers) to string type
df = pd.DataFrame(np.sort(df.values, axis=1), columns=df.columns).drop_duplicates()        #Step 3: Sort and drop dups
df['value'] = df['value'].astype(float)                                                    #Step 4: string type back to float
df = df[['node1','node2','value']]                                                         #Step 5: re-arange dataframe back to starting positions
print(df)

I used and want to refer to:

This Pandas documentation on Panda's .to_string function
This SO post to re-position columns as I found out np.sort didn't shift headers.
This SO post to sort values and drop duplicates.

I'm a 100% sure that this can be shortened to a smoother piece of code by the right person. Hope it helps in the meantime!

score 0 · Answer 2 · answered Mar 20 '20 at 07:02

Finally I find other ways in excel and I solved it using if-else function

node1 node2  value
A     B      0.05
A     C      0.84
B     A      0.05
B     C      0.97
C     A      0.84
C     B      0.97

I add new column and compare between col1 and col2 which one have higher value if(A2>B2,1,0)

node1 node2  value   cond
A     B      0.05    0
A     C      0.84    0
B     A      0.05    1
B     C      0.97    0
C     A      0.84    1
C     B      0.97    1

and I swap col1 and col2 which cond=1

node1 node2  value   cond
A     B      0.05    0
A     C      0.84    0
A     B      0.05    1
B     C      0.97    0
A     C      0.84    1
B     C      0.97    1

Finally I remove duplicates in table

node1 node2  value
A     B      0.05
A     C      0.84
B     C      0.97

Thank you everyone for find ways to answer.

I have tried in python but I can solve this problem. So, I try it in excel instead. I'm sorry that I forgot to tagged Excel — Maetha, Mar 20 '20 at 08:47

remove duplicated list in python

2 Answers2