I've been working with the WRDS/CRSP dataset (a stock price database maintained by UPenn for academic research). I've been downloading the data in Python and inserting it into my local MySQL database.
The data looks like this and has primary key on (quote_date, security_id):
quote_date security_id tr accum_index
10-Jan-86 10002 null 1000
13-Jan-86 10002 -0.026595745 973.4042548
14-Jan-86 10002 0.005464481 978.7234036
15-Jan-86 10002 -0.016304348 962.7659569
16-Jan-86 10002 0 962.7659569
17-Jan-86 10002 0 962.7659569
20-Jan-86 10002 0 962.7659569
21-Jan-86 10002 0.005524862 968.0851061
22-Jan-86 10002 -0.005494506 962.765957
23-Jan-86 10002 0 962.765957
24-Jan-86 10002 -0.005524862 957.4468078
27-Jan-86 10002 0.005555556 962.7659569
28-Jan-86 10002 0 962.7659569
29-Jan-86 10002 0 962.7659569
30-Jan-86 10002 0 962.7659569
31-Jan-86 10002 0.027624309 989.3617013
3-Feb-86 10002 0.016129032 1005.319148
4-Feb-86 10002 0.042328041 1047.872338
5-Feb-86 10002 0.04568528 1095.744679
I need to calculate the accum_index column which is basically an index of the total return of the stock and is calculated as follows:
accum_index_t = accum_index_{t-1} * (1 + tr_t)
The table has 80m rows. I've wrote some code to iterating through every security_id and calculate a cumulative product, like so:
select @sid := min(security_id)
from stock_prices;
create temporary table prices (
quote_date datetime,
security_id int,
tr double null,
accum_index double null,
PRIMARY KEY (quote_date, security_id)
);
while @sid is not null
do
select 'security_id', @sid;
select @accum := null;
insert into prices
select quote_date, security_id, tr, accum_index
from stock_prices
where security_id = @sid
order by quote_date asc;
update prices
set accum_index = (@accum := ifnull(@accum * (1 + tr), 1000.0));
update stock_prices p use index(PRIMARY), prices a use index(PRIMARY)
set p.accum_index = a.accum_index
where p.security_id = a.security_id
and p.quote_date = a.quote_date;
select @sid := min(security_id)
from stock_prices
where security_id > @sid;
delete from prices;
end while;
drop table prices;
But this is too slow, it's taking about a minute per security on my laptop and it will take years to calculate this series. Is there a way to vectorise this?
Cheers, Steve