Get a value from another row and add to the current row based on a certain info included in the current row - for the entire dataframe

Question

Although I'm aware of how to get info of specific rows based on some data (and in particular of this rock-solid answer about that), I'm struggling to figure out the best way of doing this for the entire dataframe, hopefully without having to loop through it.

I have records of "buy" and "sell" transactions, all in the same df. What triggered each "buy" transaction is recorded in the respective row where the "buy" transaction is saved. I need to copy that information to the equivalent "sell" transactions. The sell transaction rows contain the "ID" of the "buy" transactions, which are the index of the df.

     Type   Trigger   Buy ID
ID   
11   buy     AA       ---
12   buy     BB       ---
13   sell    ---      11
14   sell    ---      12

So I want to create a new column (e.g. "BuyTrigger") and add the trigger from the "buy" transaction to the "sell" rows. What I have tried (among others):

df = data[data["Type"] == "sell"]
df["BuyTrigger"] = data.loc[df["Buy order ID"]]["Trigger"]

Above, I've created a copy of a slice of the data with the sell transactions only to make the process faster. Buy that is returning all NANs for "Buy Trigger".

And just to make it super clear: there are two dataframes. DATA contains all records; DF contains only the sell transactions.

Expected result:

     Type   Trigger   Buy ID  BuyTrigger
ID   
11   buy     AA       ---     --- 
12   buy     BB       ---     ---
13   sell    ---      11      AA
14   sell    ---      12      BB

I appreciate your help!

score 1 · Answer 1 · answered Mar 13 '21 at 01:46

Here's a code snippet that should do what you want:

import pandas as pd

def handle(data):
    buy = data[data["Type"]=="buy"][["ID", "Trigger"]]
    df = data.merge(buy, left_on="Buy ID", right_on="ID", how="left", suffixes=("", "_"))
    df = df.drop(["ID_"], axis=1).rename(columns={"Trigger_": "BuyTrigger"})
    return df

data = pd.DataFrame({
    "ID": [11,12,13,14],
    "Type": ["buy", "buy", "sell", "sell"],
    "Trigger": ["AA", "BB", "", ""],
    "Buy ID": ["", "", 11, 12]
})
result = handle(data)

It works by:

Selecting the rows containing the buy information.
Merging the tables to find the corresponding sell for each buy.
Deleting/renaming the columns per your description.

I think this is too complicated. There is an easier way to do this using apply and df.loc — Joe Ferndz, Mar 13 '21 at 02:07

score 1 · Accepted Answer · answered Mar 13 '21 at 02:05

You can use apply and df.loc to get the answer. Here's how I would do it.

import pandas as pd
import numpy as np
df = pd.DataFrame({'ID':[11,12,13,14],
                   'Type':['buy','buy','sell','sell'],
                   'Trigger':['AA','BB','---','---'],
                   'Buy_ID':['---','---',11,12]}).set_index('ID')
df["BuyTrigger"] = df['Buy_ID'].apply(lambda x: df.loc[x,'Trigger'] if x != '---' else np.NaN)
print (df)

Output of this is:

    Type Trigger Buy_ID BuyTrigger
ID                                
11   buy      AA    ---        NaN
12   buy      BB    ---        NaN
13  sell     ---     11         AA
14  sell     ---     12         BB

Instead of np.NaN, you can also give '---'. Then the answer will be:

    Type Trigger Buy_ID BuyTrigger
ID                                
11   buy      AA    ---        ---
12   buy      BB    ---        ---
13  sell     ---     11         AA
14  sell     ---     12         BB

Get a value from another row and add to the current row based on a certain info included in the current row - for the entire dataframe

2 Answers2