10

Greetings all, I have two series of data: daily raw stock price returns (positive or negative floats) and trade signals (buy=1, sell=-1, no trade=0).

The raw price returns are simply the log of today's price divided by yesterday's price:

log(p_today / p_yesterday)

An example:

raw_return_series = [ 0.0063 -0.0031 0.0024 ..., -0.0221 0.0097 -0.0015]

The trade signal series looks like this:

signal_series = [-1. 0. -1. -1. 0. 0. -1. 0. 0. 0.]

To get the daily returns based on the trade signals:

daily_returns = [raw_return_series[i] * signal_series[i+1] for i in range(0, len(signal_series)-1)]

These daily returns might look like this:

[0.0, 0.00316, -0.0024, 0.0, 0.0, 0.0023, 0.0, 0.0, 0.0] # results in daily_returns; notice the 0s

I need to use the daily_returns series to compute a compounded returns series. However, given that there are 0 values in the daily_returns series, I need to carry over the last non-zero compound return "through time" to the next non-zero compound return.

For example, I compute the compound returns like this (notice I am going "backwards" through time):

compound_returns = [(((1 + compounded[i + 1]) * (1 + daily_returns[i])) - 1) for i in range(len(compounded) - 2, -1, -1)]

and the resulting list:

[0.0, 0.0, 0.0023, 0.0, 0.0, -0.0024, 0.0031, 0.0] # (notice the 0s)

My goal is to carry over the last non-zero return to the accumulate these compound returns. That is, since the return at index i is dependent on the return at index i+1, the return at index i+1 should be non-zero. Every time the list comprehension encounters a zero in the daily_return series, it essentially restarts.

Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
Jason Strimpel
  • 14,670
  • 21
  • 76
  • 106
  • @strimp099: This kind of comprehensions starts easily to look quite a mess. May I suggest more suitable tools, like http://scipy.org/. Thanks – eat Apr 01 '11 at 14:59
  • @aix the method I need to use is ((1+daily[i])*(1+compounded[i-1]))-1 . The end goal is to create a plot of the cumulative compounded returns. The method you mention is just that: cumulative simple returns-it does not include previous returns (i.e. returns are not re-invested) – Jason Strimpel Apr 01 '11 at 15:23
  • @eat I'm not sure what you mean... – Jason Strimpel Apr 01 '11 at 15:23
  • @strimp099: Just that there exists packages in `python` like `scipy\numpy` to handle this kind of computations very straightforward manner. If you are working serious manner with this kind of series, then I'll just recommend you to get to know 'more advanced' ways to handle them than 'raw' `python`. Thanks – eat Apr 01 '11 at 15:33
  • @eat oh I see. I have been looking through the scipy and numpy packages and source for a method for this calculation. numpy has the finance package but does not include a method for calculating a compound return series (at least what I could find). – Jason Strimpel Apr 01 '11 at 15:34
  • @strimp099: It's nice to hear that you are aware of `scipy\numpy`. If something you need doesn't exists there, you'll still be free to implement (missing pieces) according to your own requirements, but using more reasonable data types than what 'raw' `python` can provide. You may re tag and edit your question to indicate that `scipy\numpy` solutions are eligible as well. Thanks – eat Apr 01 '11 at 15:44

3 Answers3

9

There is a fantastic module called pandas that was written by a guy at AQR (a hedge fund) that excels at calculations like this... what you need is a way to handle "missing data"... as someone mentioned above, the basics are using the nan (not a number) capabilities of scipy or numpy; however, even those libraries don't make financial calculations that much easier... if you use pandas, you can mark the data you don't want to consider as nan, and then any future calculations will reject it, while performing normal operations on other data.

I have been using pandas on my trading platform for about 8 months... I wish I had started using it sooner.

Wes (the author) gave a talk at pyCon 2010 about the capabilities of the module... see the slides and video on the pyCon 2010 webpage. In that video, he demonstrates how to get daily returns, run 1000s of linear regressions on a matrix of returns (in a fraction of a second), timestamp / graph data... all done with this module. Combined with psyco, this is a beast of a financial analysis tool.

The other great thing it handles is cross-sectional data... so you could grab daily close prices, their rolling means, etc... then timestamp every calculation, and get all this stored in something similar to a python dictionary (see the pandas.DataFrame class)... then you access slices of the data as simply as:

close_prices['stdev_5d']

See the pandas rolling moments doc for more information on to calculate the rolling stdev (it's a one-liner).

Wes has gone out of his way to speed the module up with cython, although I'll concede that I'm considering upgrading my server (an older Xeon), due to my analysis requirements.

EDIT FOR STRIMP's QUESTION: After you converted your code to use pandas data structures, it's still unclear to me how you're indexing your data in a pandas dataframe and the compounding function's requirements for handling missing data (or for that matter the conditions for a 0.0 return... or if you are using NaN in pandas..). I will demonstrate using my data indexing... a day was picked at random... df is a dataframe with ES Futures quotes in it... indexed per second... missing quotes are filled in with numpy.nan. DataFrame indexes are datetime objects, offset by the pytz module's timezone objects.

>>> df.info
<bound method DataFrame.info of <class 'pandas.core.frame.DataFrame'>
Index: 86400 entries , 2011-03-21 00:00:00-04:00 to 2011-03-21 23:59:59-04:00
etf                                         18390  non-null values
etfvol                                      18390  non-null values
fut                                         29446  non-null values
futvol                                      23446  non-null values
...
>>> # ET is a pytz object...
>>> et
<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
>>> # To get the futures quote at 9:45, eastern time...
>>> df.xs(et.localize(dt.datetime(2011,3,21,9,45,0)))['fut']
1291.75
>>>

To give a simple example of how to calculate a column of continuous returns (in a pandas.TimeSeries), which reference the quote 10 minutes ago (and filling in for missing ticks), I would do this:

>>> df['fut'].fill(method='pad')/df['fut'].fill(method='pad').shift(600)

No lambda is required in that case, just dividing the column of values by itself 600 seconds ago. That .shift(600) part is because my data is indexed per-second.

HTH, \mike

Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
  • 1
    (+1) for pointing out `pandas`. What kind of data volumes is it designed to handle? Could it realistically be useful on one-minute bars for 1,000 instruments over a year (roughly 100K observations per instrument), or is it designed for lower-frequency stuff? – NPE Apr 01 '11 at 16:49
  • I'm munching through tick data on index futures (ES, NQ, YM)... I summarize the data at low-second intervals and process during RTH... so far, my largest tick files are ESM10 (S&P 500 Futures) quotes from May 6th (flash crash), May 7th, and May 20th of last year... My simulations (including all calculations) for those run about 5-10 minutes for a single instrument. My May 6th file has about 1.2Million individual tick entries, and that processing time includes summarizing at intervals and over 15 different analysis per-trading-sec. My server is a 2GHz Xeon Quad-core (512KB Cache) with 4GB DRAM – Mike Pennington Apr 01 '11 at 17:05
  • Never thought of using pandas even though I'm familiar with it and it's included in my Python distro (enthought). I'll give it a shot and report back, thanks. – Jason Strimpel Apr 01 '11 at 17:20
  • @Mike Pennington: For sure `pandas` is way more specific to OP's realm. However, IMHO if there still exists some missing functionality needed, then to create such functionality (by yourself), one would posses much higher chances to implement that a proficient way if only one is also 'familiar enough' with `scipy\numpy`. Thanks – eat Apr 01 '11 at 17:47
  • Indeed, you need to be familiar with numpy to get the most of pandas, but I have not found much that requires an outright extension... for the most part, I'm combining the outputs of various pandas methods to get what I need, if it's not already built-in. Most of my issues revolve around optimizing for speed... that that tends to involve more work with cython than pandas or numpy. – Mike Pennington Apr 01 '11 at 17:50
  • @Mike Pennington: So I was able convert my ~1,000 line program easily to using pandas. great call... any idea how I might get my compound return series created? I was was thinking of using a lambda function in the apply method of DataMatrix but I'm having some challenges... Thanks – Jason Strimpel Apr 02 '11 at 22:34
  • @Strimp... see the edit to my answer above... without knowing more details, the best I can do is give an example with my data. – Mike Pennington Apr 03 '11 at 10:57
  • @Mike Pennington: Thanks, see my comment. – Jason Strimpel Apr 05 '11 at 01:27
  • Wes' video is now here: https://pyvideo.org/pycon-us-2010/python-in-quantitative-finance-158.html – Martien Lubberink Sep 05 '19 at 20:12
  • Crazy this was my first intro to Pandas. At this point it was on version 0.03. I can't count how many lines of Pandas I've written since then. Ten years ago, how time flies!! – Jason Strimpel Jan 22 '21 at 08:47
3

The cumulative return part of this question is dealt with in Wes McKinney's excellent 'Python for Data Analysis' book on page 339, and uses cumprod() from Pandas to create a rebased/indexed cumulative return from calculated price changes.

Example from book:

import pandas.io.data as web

price = web.get_data_yahoo('AAPL', '2011-01-01')['Adj Close']

returns = price.pct_change()

ret_index = (1 + returns).cumprod()

ret_index[0] = 1 # Set first value to 1
Abramodj
  • 5,709
  • 9
  • 49
  • 75
Carl
  • 598
  • 2
  • 11
  • 25
1

imagine I have a DataMatrix with closing prices, some indicator value, and a trade signal like this:

 >>> data_matrix
                        close          dvi            signal
 2008-01-02 00:00:00    144.9          0.6504         -1             
 2008-01-03 00:00:00    144.9          0.6603         -1             
 2008-01-04 00:00:00    141.3          0.7528         -1             
 2008-01-07 00:00:00    141.2          0.8226         -1             
 2008-01-08 00:00:00    138.9          0.8548         -1             
 2008-01-09 00:00:00    140.4          0.8552         -1             
 2008-01-10 00:00:00    141.3          0.846          -1             
 2008-01-11 00:00:00    140.2          0.7988         -1             
 2008-01-14 00:00:00    141.3          0.6151         -1             
 2008-01-15 00:00:00    138.2          0.3714         1   

I use the signal to create a DataMatrix of returns based on the trade signal:

>>> get_indicator_returns()

                   indicator_returns    
2008-01-02 00:00:00    NaN            
2008-01-03 00:00:00    0.000483       
2008-01-04 00:00:00    0.02451        
2008-01-07 00:00:00    0.0008492      
2008-01-08 00:00:00    0.01615        
2008-01-09 00:00:00    -0.01051       
2008-01-10 00:00:00    -0.006554      
2008-01-11 00:00:00    0.008069       
2008-01-14 00:00:00    -0.008063      
2008-01-15 00:00:00    0.02201 

What I ended up doing is this:

def get_compounded_indicator_cumulative(self):

    indicator_dm = self.get_indicator_returns()
    dates = indicator_dm.index

    indicator_returns = indicator_dm['indicator_returns']
    compounded = array(zeros(size(indicator_returns)))

    compounded[1] = indicator_returns[1]

    for i in range(2, len(indicator_returns)):

        compounded[i] = (1 + compounded[i-1]) * (1 + indicator_returns[i]) - 1

    data = {
        'compounded_returns': compounded
    }

    return DataMatrix(data, index=dates)

For some reason I really struggled with this one...

I'm in the process of converting all my price series to PyTables. Looks promising so far.

Jason Strimpel
  • 14,670
  • 21
  • 76
  • 106
  • @Strimp... please test this out instead of the `for` loop and tell me if it works... `compounded = (1 + compounded.shift(1))*(1 + indicator_returns) - 1`. You might need to reassign `compounded[1]` after it finishes, but this should be faster than iterating over the matrix line-by-line... – Mike Pennington Apr 05 '11 at 03:52
  • Ok, is there anything else I can assist with? – Mike Pennington Apr 06 '11 at 12:29
  • @Mike Pennington: Done, sorry, new to Stack :) – Jason Strimpel Apr 10 '11 at 13:16