3

I have this pandas dataframe where 1 in long_entry or short_entry represents entering the trade at that time with a long/short position corresponding. While a 1 in long_exit or short_exit means exiting a trade. May I know how can I calculate the PnL of every single trade to be displayed in a new column df['pnl_per_trade']?

Only a maximum of 1 trade/position at any point of time for this backtesting.

Below is my dataframe. As we can see, a long trade is entered on 26/2/2019 and closed at 1/3/2019 and the Pnl will be $64.45 while a short trade is entered on 4/3/2019 and closed on 5/3/2019 with a pnl of -$119.11 (loss).

        date    price       long_entry  long_exit   short_entry short_exit
0   24/2/2019   4124.25           0          0           0              0
1   25/2/2019   4130.67           0          0           0              0
2   26/2/2019   4145.67           1          0           0              0
3   27/2/2019   4180.10           0          0           0              0
4   28/2/2019   4200.05           0          0           0              0
5   1/3/2019    4210.12           0          1           0              0
6   2/3/2019    4198.10           0          0           0              0
7   3/3/2019    4210.34           0          0           0              0
8   4/3/2019    4100.12           0          0           1              0
9   5/3/2019    4219.23           0          0           0              1

I hope to have an output like this:

        date    price       long_entry  long_exit   short_entry short_exit  pnl
0   24/2/2019   4124.25           0          0           0             0    NaN
1   25/2/2019   4130.67           0          0           0             0    NaN
2   26/2/2019   4145.67           1          0           0             0  64.45
3   27/2/2019   4180.10           0          0           0             0    NaN
4   28/2/2019   4200.05           0          0           0             0    NaN
5   1/3/2019    4210.12           0          1           0             0    NaN
6   2/3/2019    4198.10           0          0           0             0    NaN
7   3/3/2019    4210.34           0          0           0             0    NaN
8   4/3/2019    4100.12           0          0           1             0 -119.11
9   5/3/2019    4219.23           0          0           0             1    NaN

Since I have lots of data, I would prefer the code to avoid having any loops if possible. Thank you!

atjw94
  • 529
  • 1
  • 6
  • 22

2 Answers2

4

I extended your sample data to have 2 long PnL values and changed date column to DateTime:

df = pd.DataFrame(data=[
    [ '24/2/2019', 4124.25, 0, 0, 0, 0 ],
    [ '25/2/2019', 4130.67, 0, 0, 0, 0 ],
    [ '26/2/2019', 4145.67, 1, 0, 0, 0 ],
    [ '27/2/2019', 4180.10, 0, 0, 0, 0 ],
    [ '28/2/2019', 4200.05, 0, 0, 0, 0 ],
    [ '1/3/2019',  4210.12, 0, 1, 0, 0 ],
    [ '2/3/2019',  4198.10, 0, 0, 0, 0 ],
    [ '3/3/2019',  4210.34, 0, 0, 0, 0 ],
    [ '4/3/2019',  4100.12, 0, 0, 1, 0 ],
    [ '5/3/2019',  4219.23, 0, 0, 0, 1 ],
    [ '6/3/2019',  4210.00, 1, 0, 0, 0 ],
    [ '7/3/2019',  4212.00, 0, 0, 0, 0 ],
    [ '8/3/2019',  4214.00, 0, 1, 0, 0 ]],
    columns=['date','price', 'long_entry', 'long_exit',
        'short_entry', 'short_exit'])
df.date = pd.to_datetime(df.date)

The next step is to generate df2 containing only rows for start and end of long entries (actually only date and price columns will be needed, but for illustration purpose, I included also long_entry and long_exit:

df2 = df.query('long_entry > 0 or long_exit > 0').iloc[:,0:4]; df2

The result (for my data) is:

         date    price  long_entry  long_exit
2  2019-02-26  4145.67           1          0
5  2019-01-03  4210.12           0          1
10 2019-06-03  4210.00           1          0
12 2019-08-03  4214.00           0          1

Then we have to define a function to be applied soon:

def fn(src):
    return pd.Series([src.iloc[0, 0], src.iloc[1, 1] - src.iloc[0, 1]])

The next step is to apply the above function to consecutive pairs of rows (entry and exit), set column names and change the date column to the index:

lProf = df2.groupby(np.arange( len(df2.index)) // 2).apply(fn)
lProf.columns = ['date', 'pnl']
lProf.set_index('date', inplace=True)

The result is:

             pnl
date             
2019-02-26  64.45
2019-06-03   4.00

So far we have data to insert from long entries. Now it's time to generate similiar DataFrame for short entries, applying the same function as before:

df2 = df.query('short_entry > 0 or short_exit > 0').iloc[:,[0, 1, 4, 5]]
sProf = df2.groupby(np.arange( len(df2.index)) // 2).apply(fn)
sProf.columns = ['date', 'pnl']
sProf.set_index('date', inplace=True)

But this time we have to change the sign of received values:

sProf = -sProf

The result is:

               pnl
date              
2019-04-03 -119.11

Before we add the results to the main DataFrame, we have to set date column as the index:

df.set_index('date', inplace=True)

And now we add the results of long entries:

df['pnl'] = lProf

This has created a new column, so now, to add results of short entries, we have to do update:

df.update(sProf)

If you want to have date back as regular column, run:

df.reset_index(inplace=True)
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • Thank you! Could you explain to me how does the function work? And also what does the df2.groupby do? – atjw94 May 19 '19 at 15:48
  • To see how each part of the solution works, execute each instruction, print the result and identify what has this step done on the way to the result. As far as *groupby* is concerned, you probably should get acquainted with *pandas* documentation. But be aware that e.g. *apply* has 2 "flavours": 1. Application to each row or column of the whole DataFrame. 2. Application to each group (resulting from *groupby*). Take also some time to search the Web for particular details. Even on StackOverflow there can be found numerous descriptions concerning e.g. use cases of *groupby*. – Valdi_Bo May 19 '19 at 18:00
-4

I am not sure if this will help: but I think your concept about PnL might not be right. The following shows how to get daily pnl value instead of the position's pnl.

def get_position(long_entry,long_exit, short_entry,short_exit):
    if long_entry == 1 or short_exit == 1:
        position = 1
    elif long_exit == 1 or short_entry == 1:
        position = -1
    else:
        position = 0

    return position

df['position'] = list(map(get_position, df.long_entry.values, df.long_exit.values, df.short_entry.values, df.short_exit.values))

df = df[['date', 'price','position']]

df['amount'] = -df['price']*df['position']
df['pnl'] = df['amount'].cumsum()

This is the result:

        date    price  position   amount      pnl
0  24/2/2019  4124.25         0    -0.00    -0.00
1  25/2/2019  4130.67         0    -0.00    -0.00
2  26/2/2019  4145.67         1 -4145.67 -4145.67
3  27/2/2019  4180.10         0    -0.00 -4145.67
4  28/2/2019  4200.05         0    -0.00 -4145.67
5   1/3/2019  4210.12        -1  4210.12    64.45
6   2/3/2019  4198.10         0    -0.00    64.45
7   3/3/2019  4210.34         0    -0.00    64.45
8   4/3/2019  4100.12        -1  4100.12  4164.57
9   5/3/2019  4219.23         1 -4219.23   -54.66

which is the cumulative pnl regardless of long or short position. Hope it helps.

lrh09
  • 557
  • 4
  • 13