0

I am trying to build an equity curve in Python using Pandas. For those not in the know, an equity curve is a cumulative tally of investing profits/losses day by day. The code below works but it is incredibly slow. I've tried to build an alternate using Pandas .iloc and such but nothing is working. I'm not sure if it is possible to do this outside of a loop given how I have to reference the prior row(s).

for today in range(len(f1)): #initiate a loop that runs the length of the "f1" dataframe 

    if today == 0: #if the index value is zero (aka first row in the dataframe) then...

        f1.loc[today,'StartAUM'] = StartAUM #Set intial assets 
        f1.loc[today,'Shares'] = 0 #dummy placeholder for shares; no trading on day 1 
        f1.loc[today,'PnL'] = 0 #dummy placeholder for P&L; no trading day 1 
        f1.loc[today,'EndAUM'] = StartAUM #set ending AUM; should be beginning AUM since no trades 
        continue #and on to the second row in the dataframe 

    yesterday = today - 1 #used to reference the rows (see below)

    f1.loc[today,'StartAUM'] = f1.loc[yesterday,'EndAUM'] #todays starting aseets are yesterday's ending assets 
    f1.loc[today,'Shares'] = f1.loc[yesterday,'EndAUM']//f1.loc[yesterday,'Shareprice'] #today's shares to trade = yesterday's assets/yesterday's share price 
    f1.loc[today,'PnL'] = f1.loc[today,'Shares']*f1.loc[today,'Outcome1'] #Our P&L should be the shares traded (see prior line) multiplied by the outcome for 1 share
    #Note Outcome1 came from the dataframe before this loop >> for the purposes here it's value is irrelevant 
    f1.loc[today,'EndAUM'] = f1.loc[today,'StartAUM']+f1.loc[today,'PnL'] #ending assets are starting assets + today's P&L 
GC123
  • 323
  • 5
  • 15

6 Answers6

0

There is a good example here: http://www.pythonforfinance.net/category/basic-data-analysis/ and I know that there is an example in Wes McKinney's book Python for Data Analysis. You might be able to find it here: http://wesmckinney.com/blog/python-for-financial-data-analysis-with-pandas/

Harvey
  • 329
  • 2
  • 15
  • In Python for Data Analysis (the original) pg 338 there is a section on Return Indexes and Cumulative returns and focuses on things like ".pct_change()", ".cumprod()" but these don't address this issue. The pythonforfinance website link you sent also uses cumsum and thus I don't believe works here. If you can let me know how I could use them in this context it would be appreciated. @HarveySummers – GC123 Oct 24 '17 at 20:00
  • Please include the contents of the link in your answer – Ibo Nov 03 '17 at 03:33
0

You need to vectorize the operations (don't iterate with for but rather compute whole column at once)

# fill the initial values
f1['StartAUM'] = StartAUM  # Set intial assets
f1['Shares'] = 0  # dummy placeholder for shares; no trading on day 1
f1['PnL'] = 0  # dummy placeholder for P&L; no trading day 1
f1['EndAUM'] = StartAUM  # s

#do the computations (vectorized)
f1['StartAUM'].iloc[1:] = f1['EndAUM'].iloc[:-1]
f1['Shares'].iloc[1:] = f1['EndAUM'].iloc[:-1] // f1['Shareprice'].iloc[:-1] 
f1['PnL'] = f1['Shares'] * f1['Outcome1']  
f1['EndAUM'] = f1['StartAUM'] + f1 ['PnL'] 

EDIT: this will not work correctly since StartAUM, EndAUM, Shares depend on each other and cannot be computed one without another. I didn't notice that before.

matusko
  • 3,487
  • 3
  • 20
  • 31
  • Thank you for the response. The code you shared is not working. The StartAUM is not updating each new day, it is staying a constant. Otherwise the code looks good. Thoughts on the StartAUM issue? Thanks – GC123 Oct 23 '17 at 21:22
  • @matsuko can you update the code per my prior question? – GC123 Oct 24 '17 at 16:09
0

Have you tried using iterrows() to construct the for loop?

for index, row in f1.iterrows():
    if today == 0:
        row['StartAUM'] = StartAUM #Set intial assets 
        row['Shares'] = 0 #dummy placeholder for shares; no trading on day 1 
        row['PnL'] = 0 #dummy placeholder for P&L; no trading day 1 
        row['EndAUM'] = StartAUM #set ending AUM; should be beginning AUM since no trades 
        continue #and on to the second row in the dataframe 

yesterday = row[today] - 1 #used to reference the rows (see below)

row['StartAUM'] = row['EndAUM'] #todays starting aseets are yesterday's ending assets 
row['Shares'] = row['EndAUM']//['Shareprice'] #today's shares to trade = yesterday's assets/yesterday's share price 
row['PnL'] = row['Shares']*row['Outcome1'] #Our P&L should be the shares traded (see prior line) multiplied by the outcome for 1 share
#Note Outcome1 came from the dataframe before this loop >> for the purposes here it's value is irrelevant 
row['EndAUM'] = row['StartAUM']+row['PnL'] #ending assets are starting assets + today's P&L 

Probably the code is so slow as loc goes through f1 from beginning every time. iterrows() uses the same dataframe as it loops through it row by row. See more details about iterrows() here.

MapleSyrup
  • 104
  • 9
0

Can you try the following:

#import relevant modules
import pandas as pd
import numpy as np
from pandas_datareader import data
import matplotlib.pyplot as plt


#download data into DataFrame and create moving averages columns
f1 = data.DataReader('AAPL', 'yahoo',start='1/1/2017')

StartAUM = 1000000

#populate DataFrame with starting values
f1['Shares'] = 0
f1['PnL'] = 0
f1['EndAUM'] = StartAUM

#Set shares held to be the previous day's EndAUM divided by the previous day's closing price
f1['Shares'] = f1['EndAUM'].shift(1) / f1['Adj Close'].shift(1)
#Set the day's PnL to be the number of shares held multiplied by the change in closing price from yesterday to today's close
f1['PnL'] = f1['Shares'] * (f1['Adj Close'] - f1['Adj Close'].shift(1))
#Set day's ending AUM to be previous days ending AUM plus daily PnL
f1['EndAUM'] = f1['EndAUM'].shift(1) + f1['PnL']
#Plot the equity curve
f1['EndAUM'].plot()

Does the above solve your issue?

s666
  • 528
  • 4
  • 13
  • 33
0

The solution was to use the Numba package. It performs the loop task in a fraction of the time.

https://numba.pydata.org/

The arguments/dataframe can be passed to the numba module/function. I will try to write up a more detailed explanation with code when time permits.

Thanks to all

GC123
  • 323
  • 5
  • 15
0

In case others come across this, you can definitely make an equity curve without loops.

Dummy up some data

import pandas as pd
import numpy as np
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (13, 10)

# Some data to work with
np.random.seed(1)
stock = pd.DataFrame(
    np.random.randn(100).cumsum() + 10, 
    index=pd.date_range('1/1/2020', periods=100, freq='D'),
    columns=['Close']
    )
stock['ma_5'] = stock['Close'].rolling(5).mean()
stock['ma_15'] = stock['Close'].rolling(15).mean()

Holdings: simple long/short based on moving average crossover signals

longs = stock['Close'].where(stock['ma_5'] > stock['ma_15'], np.nan)
shorts = stock['Close'].where(stock['ma_5'] < stock['ma_15'], np.nan)

# Quick plot
stock.plot()
longs.plot(lw=5, c='green')
shorts.plot(lw=5, c='red')

enter image description here

EQUITY CURVE:

Identify which side (l/s) has first holding (ie: first trade, in this case, short), then keep the initial trade price and subsequently cumulatively sum the daily changes (there would normally be more nan's in the series if you have exit rules as well for when you are out of the market), and finally forward fill over the nan values and fill any last remaining nans with zeros. Its basically the same for the second opposite holdings (in this case, long) except don't keep the starting price. The other important thing is to invert the short daily changes (ie: negative changes should be positive to the PnL).

lidx = np.where(longs > 0)[0][0]
sidx = np.where(shorts > 0)[0][0]
startdx = min(lidx, sidx)

# For first holding side, keep first trade price, then calc daily change fwd and ffill nan's
# For second holdng side, get cumsum of daily changes, ffill and fillna(0) (make sure short changes are inverted)
if lidx == startdx:
    lcurve = longs.diff() # get daily changes
    lcurve[lidx] = longs[lidx] # put back initial starting price
    lcurve = lcurve.cumsum().ffill() # add dialy changes/ffill to build curve
    scurve = -shorts.diff().cumsum().ffill().fillna(0) # get daily changes (make declines positive changes)
else:
    scurve = -shorts.diff() # get daily changes (make declines positive changes)
    scurve[sidx] = shorts[sidx] # put back initial starting price
    scurve = scurve.cumsum().ffill() # add dialy changes/ffill to build curve
    lcurve = longs.diff().cumsum().ffill().fillna(0) # get daily changes

Add the 2 long/short curves together to get the final equity curve

eq_curve = lcurve + scurve

# quick plot
stock.iloc[:, :3].plot()
longs.plot(lw=5, c='green', label='Long')
shorts.plot(lw=5, c='red', label='Short')
eq_curve.plot(lw=2, ls='dotted', c='orange', label='Equity Curve')
plt.legend()

enter image description here

footfalcon
  • 581
  • 5
  • 16