4

As part of an exercise using some Quandl stock exchange data I was asked to calculate the largest change in closing price between two trading days for a particular stock in 2017. I came up with different approaches and timed them. I was expecting itertools.islice to be faster and it wasn't. (My expectations were being set by this SO answer) To add an additional detail, I want to print out which two trading dates the largest change occurred between so I need the date info also - initially I only found solutions with a for loop for that.

A couple of questions:

  • Did I approach the timing correctly - ensuring the only thing in the cell is the code I wanted to time etc?
  • Is the main reason for my results that I only have 255 items in my dict so local resources aren't taxed and loading the full list is trivial?

I was working in an Jupyter notebook which is available here - this question is related to Q5.

My data is a dict of namedtuples:

{'2017-12-28': Tradeday(date='2017-12-28', open=51.65, high=51.82, low=51.43, close=51.6, change=None, traded_vol=40660.0, turn_over=2099024.0, last_price=None, d_trad_unit=None, d_turnover=None),
 '2017-12-29': Tradeday(date='2017-12-29', open=51.76, high=51.94, low=51.45, close=51.76, change=None, traded_vol=34640.0, turn_over=1792304.0, last_price=None, d_trad_unit=None, d_turnover=None)}

Which I then sorted into an OrderedDict:

o_data = OrderedDict(sorted(data.items(), key=lambda t:t[0]))

Method 4 - flatten data into list of (k,v) tuples and iterate over a range:

od_list = list(o_data.items())
max_change, max_st, max_ed  = 0, '', ''

for i in range(len(o_data)-1):
    change = abs(od_list[i][1].close-od_list[i+1][1].close)
    if change > max_change:
        max_change, max_st, max_ed = change, od_list[i][1].date, od_list[i+1][1].date

Using %%timeit on the cell I got these results:

117 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Method 5 - using enumerate and avoiding making a list in memory

mx, st, en = 0, '', ''
for i, v in enumerate(o_data.values()):
    if i < len(o_data)-1:
        ch = abs(v.close-next(islice(o_data.values(), i+1, i+2)).close)
        if ch > mx:
            mx, st, en = ch, v.date, next(islice(o_data.values(), i+1, i+2)).date

Using %%timeit on the cell I got these results:

1.55 ms ± 12.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edit: (method names/numbers updated to match those I used in the performance notebook)

Alison K
  • 145
  • 1
  • 8
  • 3
    Did you try `max(abs(u.close-v.close) for u,v in zip(o_data.values(),o_data.values()[1:]))`? – DYZ Jun 09 '18 at 04:22
  • Thanks for the quick response @DyZ Just trying that now - got 'odict_values' object is not subscriptable... `max(abs(u.close-v.close) for u,v in zip(list(o_data.values()),list(o_data.values())[1:]))` worked.... although I'll still need the for loop to get at the start and end dates for the range right? – Alison K Jun 09 '18 at 04:30
  • You said you only needed the largest change, not the dates. Please update your question if you want something else. – DYZ Jun 09 '18 at 04:31
  • Yup - added additional details to clarify why I've got the three variables `max_change, max_st, max_ed = 0, '', ''` and `mx, st, en = 0, '', '' ` – Alison K Jun 09 '18 at 04:41
  • Thanks once again DyZ - I was able to extend your suggestion to use zip() to get at the date values I wanted. `max([(abs(t.close-p.close), t.date, p.date) for t, p in zip(list(o_data.values()),list(o_data.values())[1:])])` – Alison K Jun 10 '18 at 07:24

1 Answers1

1

So with much thanks to DyZ who set me on the right track I have answered my own question(s):

I figured out how to get the dates in a one line comprehension:

max([(abs(t.close-p.close), t.date, p.date) for t, p in zip(list(o_data.values()),list(o_data.values())[1:])], key=lambda x: x[0])

which simplifies as the max is working on the first element of the tuple

max([(abs(t.close-p.close), t.date, p.date) for t, p in zip(list(o_data.values()),list(o_data.values())[1:])])

I also explored my performance questions and created this Jupyter notebook: Q5 Performance

The simple in memory list (method 4) was working well with 255 days of data, but when I increased that to 4643 days of data it slowed down significantly. The enumerate with islice (method 5) didn't slow down nearly as much and with the larger data set it was faster than either the simple in memory list or the one line list comprehension (method3).

(method names/numbers updated to match those I used in the performance notebook)

I'd be very happy if someone wants to add some theory behind the observed performances!

Alison K
  • 145
  • 1
  • 8