As part of an exercise using some Quandl stock exchange data I was asked to calculate the largest change in closing price between two trading days for a particular stock in 2017. I came up with different approaches and timed them. I was expecting itertools.islice to be faster and it wasn't. (My expectations were being set by this SO answer) To add an additional detail, I want to print out which two trading dates the largest change occurred between so I need the date info also - initially I only found solutions with a for loop for that.
A couple of questions:
- Did I approach the timing correctly - ensuring the only thing in the cell is the code I wanted to time etc?
- Is the main reason for my results that I only have 255 items in my dict so local resources aren't taxed and loading the full list is trivial?
I was working in an Jupyter notebook which is available here - this question is related to Q5.
My data is a dict of namedtuples:
{'2017-12-28': Tradeday(date='2017-12-28', open=51.65, high=51.82, low=51.43, close=51.6, change=None, traded_vol=40660.0, turn_over=2099024.0, last_price=None, d_trad_unit=None, d_turnover=None),
'2017-12-29': Tradeday(date='2017-12-29', open=51.76, high=51.94, low=51.45, close=51.76, change=None, traded_vol=34640.0, turn_over=1792304.0, last_price=None, d_trad_unit=None, d_turnover=None)}
Which I then sorted into an OrderedDict:
o_data = OrderedDict(sorted(data.items(), key=lambda t:t[0]))
Method 4 - flatten data into list of (k,v) tuples and iterate over a range:
od_list = list(o_data.items())
max_change, max_st, max_ed = 0, '', ''
for i in range(len(o_data)-1):
change = abs(od_list[i][1].close-od_list[i+1][1].close)
if change > max_change:
max_change, max_st, max_ed = change, od_list[i][1].date, od_list[i+1][1].date
Using %%timeit on the cell I got these results:
117 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Method 5 - using enumerate and avoiding making a list in memory
mx, st, en = 0, '', ''
for i, v in enumerate(o_data.values()):
if i < len(o_data)-1:
ch = abs(v.close-next(islice(o_data.values(), i+1, i+2)).close)
if ch > mx:
mx, st, en = ch, v.date, next(islice(o_data.values(), i+1, i+2)).date
Using %%timeit on the cell I got these results:
1.55 ms ± 12.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
edit: (method names/numbers updated to match those I used in the performance notebook)