-1

I want to remove duplicate lists in my data after I use stack(). This is example

node1 node2  value
A     B      0.05
A     C      0.84
B     A      0.05
B     C      0.97
C     A      0.84
C     B      0.97

into

node1 node2  value
A     B      0.05
A     C      0.84
B     C      0.97

Thanks

Edit: I attach a photo of my data. Row 2 and 174 have same node but different position. I want to keep only one interaction (row2) Ps. the value aren't unique. enter image description here

Maetha
  • 43
  • 1
  • 1
  • 9

2 Answers2

2

Right, So I'm a Python rookie/beginner but tried a few different things with bits and pieces from what I could find:

import pandas as pd
import numpy as np
df = pd.DataFrame({'node1':['A','A','B','B','C','C'],
                   'node2':['B','C','A','C','A','B'],
                   'value':[0.05,0.84,0.05,0.97,0.84,0.97]})
df = df[['value','node1','node2']]                                                         #Step 1: re-arange dataframe to work properly
df['value'] = df['value'].astype(str)                                                      #Step 2: float type (numbers) to string type
df = pd.DataFrame(np.sort(df.values, axis=1), columns=df.columns).drop_duplicates()        #Step 3: Sort and drop dups
df['value'] = df['value'].astype(float)                                                    #Step 4: string type back to float
df = df[['node1','node2','value']]                                                         #Step 5: re-arange dataframe back to starting positions
print(df)

I used and want to refer to:

  • This Pandas documentation on Panda's .to_string function
  • This SO post to re-position columns as I found out np.sort didn't shift headers.
  • This SO post to sort values and drop duplicates.

I'm a 100% sure that this can be shortened to a smoother piece of code by the right person. Hope it helps in the meantime!

JvdV
  • 70,606
  • 8
  • 39
  • 70
0

Finally I find other ways in excel and I solved it using if-else function

node1 node2  value
A     B      0.05
A     C      0.84
B     A      0.05
B     C      0.97
C     A      0.84
C     B      0.97

I add new column and compare between col1 and col2 which one have higher value if(A2>B2,1,0)

node1 node2  value   cond
A     B      0.05    0
A     C      0.84    0
B     A      0.05    1
B     C      0.97    0
C     A      0.84    1
C     B      0.97    1

and I swap col1 and col2 which cond=1

node1 node2  value   cond
A     B      0.05    0
A     C      0.84    0
A     B      0.05    1
B     C      0.97    0
A     C      0.84    1
B     C      0.97    1

Finally I remove duplicates in table

node1 node2  value
A     B      0.05
A     C      0.84
B     C      0.97

Thank you everyone for find ways to answer.

Maetha
  • 43
  • 1
  • 1
  • 9
  • I have tried in python but I can solve this problem. So, I try it in excel instead. I'm sorry that I forgot to tagged Excel – Maetha Mar 20 '20 at 08:47