0

I'm trying to datafill 2 new columns from 2 existing columns, based on the value of another column.

The logic is that given a positive amount the buyer and seller should be filled out from the party and cparty fields respectively. If the amount is negative then the situation is reversed and the buyer is the cparty rather than the party, and the seller is the party.

I'm trying to avoid doing something iterative - I can get each component using the expressions below but, but having tried to concatenate these results with concat, +, +=, combine_first, fillna and update, I've drawn a blank over how to merge the results.

Each time they're either been overwritten (I suspect because Pandas matches on the column name, and not position) or I get 2 empty columns.

There must be a nice clean pythonic way to combine the below, or similar?

df[['Buyer', 'Seller']] = df[df.amount > 0][['party', 'cparty']]
df[['Buyer', 'Seller']] = df[df.amount < 0][['cparty', 'party']]
Phil
  • 592
  • 6
  • 15
  • I don't know if there is such an answer. But I put here the test df for others. `df=pd.DataFrame({"a":[1, 2, 3],"b":[4,5,6], "c":[1, -1, 1]})` – Nelson Dinh Nov 27 '17 at 13:55

1 Answers1

1

Maybe you are looking for np.where as a one liner i.e

For example :

df = pd.DataFrame({'key': ['a','b','b','c','c'],'key2': ['a','d','d','e','e'],'key3': ['j','k','l','m','n'], 'x': [1,2,3,4,5]})

df[['new1','new2']] = pd.DataFrame(np.where(df['x']>2,(df['key3'],df['key2']),(df['key2'],df['key3'])).T)

   key key2 key3  x new1 new2
0   a    a    j  1    a    j
1   b    d    k  2    d    k
2   b    d    l  3    l    d
3   c    e    m  4    m    e
4   c    e    n  5    n    e

In your case you can do

df[['Buyer', 'Seller']] = pd.DataFrame(np.where(df.amount < 0,(df['cparty'],df['party']),(df['party'],df['cparty'])).T)
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
  • Thanks. This works well. For my understanding, I'm still curious of how to combine the two results in my original question, although I accept this answer is probably tidier because the if and else branches can be handled in a single call to where(). One last thing - the intermediate state before the transpose, looks like a list of tuples of [(new1,new2)], but when I inspect it it's a an array of 2 lists [[new1],[new2]]. I would expect a list of tuples not a list of 2 lists? – Phil Nov 27 '17 at 14:42
  • 1
    Yes tuples will be converted to a a 2D array. – Bharath M Shetty Nov 27 '17 at 14:44
  • 1
    Interestingly `df['Buyer'], df['Seller'] = zip(*np.where(df.amount < 0,(df['cparty'],df['party']),(df['party'],df['cparty'])).T)` is about 20% faster for 100000 iterations using timeit. It's functionally identical. – Phil Nov 27 '17 at 15:44