Bit of a puzzler here.
I have a database table that has a set of statistics as columns, a set of averages that are all very similar.
It was really easy to set up a function to iterate over my data set a bunch of times, and update rows in the database a bunch of times.
BUT it's ridiculously slow and a big long script that just does 1 set of updates 1 time is going to take a lot of time to write and will be a pain to maintain.
I was wondering if there is a way to somehow both be OOP but also not do so much redundant seeking / number crunching.
I was wondering if there might be an easier way... SSIS would have let me just pull out the data once, merge it at the end, and throw it back into the database. But I am not sure how I'd do that with Python...
So I defined a function in python that does an average of n days:
def moving_average(conn, cur, dates, days, stockid, col_name):
# print days
if (len(dates) < days):
print "WARNING: Cannot do a " + str(days) + " day moving average for stock id " + str(stock_id) + "! Not enough data!"
else:
for x in range (0, len(dates) - int(days) - 1):
row_id = dates[x][0]
numerator = 0
for i in range (0, days - 1):
numerator = numerator + dates[x][3]
moving_average_for_period = round(numerator / days, 2)
sql = "UPDATE historical_prices SET " + col_name + " = " + str(moving_average_for_period) + " WHERE id = " + str(row_id) + ";"
cur.execute(sql)
conn.commit()
So far so good.
Then I call:
# now do the moving averages
moving_average(conn,cur,dates,5,stock_id,'five_day_ma')
moving_average(conn,cur,dates,10,stock_id,'ten_day_ma')
moving_average(conn,cur,dates,15,stock_id,'fifteen_day_ma')
moving_average(conn,cur,dates,30,stock_id,'thirty_day_ma')
moving_average(conn,cur,dates,60,stock_id,'sixty_day_ma')
moving_average(conn,cur,dates,120,stock_id,'onetwenty_day_ma')
See my problem? I do my I/O 6x.
Thanks in advance for any help.
Cheers,
Dan