1

The sample data is :

import pandas as pd
import numpy as np
d=pd.DataFrame({'lender':['tony','wood','tony','tidy'],
                'borrower':['wood','tony','wood','tony']})

enter image description here

I want to connect lender and borrower and most importantly sort them within each row. In a word, I want get P:

enter image description here

The sort principle is the same as the default sorted function. For example:

sorted(['tony','wood'])
Out[221]: ['tony', 'wood']

sorted(['wood','tony'])
Out[222]: ['tony', 'wood']

apply is preferred unless you have to use for loop. Please read it clearly before you duplicate my question!!!

adafdwwf
  • 162
  • 3
  • 12

1 Answers1

2

One solution is use apply with sorted and join per each row:

d['p'] = d[['lender','borrower']].apply(lambda x: '_'.join(sorted(x)), axis=1)
print (d)
  lender borrower          p
0   tony     wood  tony_wood
1   wood     tony  tony_wood
2   tony     wood  tony_wood
3   tidy     tony  tidy_tony

Or use numpy.sort with DataFrame constructor is performance is important:

d1 = pd.DataFrame(np.sort(d[['lender','borrower']], axis=1))
d['p'] = d1[0] + '_' + d1[1]
print (d)
  lender borrower          p
0   tony     wood  tony_wood
1   wood     tony  tony_wood
2   tony     wood  tony_wood
3   tidy     tony  tidy_tony
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252