1

I am using Python Pandas to process the dataframe below:

dataframe:

Company_List1    Company_List2
A                B
A                C
B                D
B                A
D                B
E                F

I want to remove the rows below since I already have A --> B and B -->D:

(However, I cannot simply use drop_duplicates to do the work)

Company_List1    Company_List2
B                A
D                B

Expected output:

Company_List1    Company_List2
A                B
A                C
B                D
E                F

Thank you in advance for the help!

Boomshakalaka
  • 521
  • 1
  • 6
  • 19

1 Answers1

1
df1=pd.DataFrame(np.sort(df.values, axis=1), columns=df.columns)# Sort values row wise
df.iloc[df1.drop_duplicates(keep='first').index,:]#exttract inex and mask required
wwnde
  • 26,119
  • 6
  • 18
  • 32