0

I am trying to convert millions of epochs to a tuple (X,Y) where X (Boolean) is whether it was a weekend and Y is the minute of day it was in the range of (0,1440)

The simple, correct way to do this by converting to datetime:

def _epoch_to_dinfo(epoch):
    d = datetime.utcfromtimestamp(epoch) #SLOW AS F
    is_weekday = d.isoweekday() in range(1, 6)
    minute_of_day = d.hour*60 + d.minute
    return is_weekday, minute_of_day

is WAY too slow. I am looking for an approximation to this; below is my best attempt:

def _epoch_to_dinfo(epoch):  
    #return (epoch / 86400) % 7 not in [2,3], (epoch % 86400) / 60
    days_since_epoch = epoch / 86400
    days_after_thursday = days_since_epoch % 7  #the epoch was a thursday
    is_weekday = days_after_thursday not in [2,3]
    minute_of_day = (epoch % 86400) / 60
    return is_weekday, minute_of_day

Is there a faster way to do this?

Tommy
  • 12,588
  • 14
  • 59
  • 110
  • 1
    sorry, not an answer, but this just bugs me: `True if days_after_thursday in [2,3] else False` = `days_after_thursday in [2,3]` – yurib Oct 07 '15 at 19:19
  • @yurib, haha, fixed =) – Tommy Oct 07 '15 at 19:29
  • One micro-optimization might be `is_weekend = [False, False, True, True, False, False, False][days_after_thursday]` but it's unlikely to be significant. Is there anything wrong with the code you have now? – Mark Ransom Oct 07 '15 at 19:41
  • there is nothing wrong with it, it works, but it is still slow, though not nearly as slow as the first one – Tommy Oct 07 '15 at 19:42
  • 2
    @Tommy first of all, the code you posted doesn't work (first example), after fixing it a little bit, converting 1,000,000 epochs took 1.4 seconds so I don't know why you say it's slow... – yurib Oct 07 '15 at 19:42
  • can you please post the "fixed" code? I checked several against the first method and they all worked, so if the first is wrong, both are wrong. – Tommy Oct 07 '15 at 19:43
  • @Tommy where is epoch used in the first method? – roymustang86 Oct 07 '15 at 19:48
  • oh yeah, is_weekday was a function i defined. fixed. – Tommy Oct 07 '15 at 19:49
  • @MarkRansom: That would only gain you something if the literal were a `tuple` (so Python could store a single const tuple in the byte code at compile time and reuse it) and even then, only on Python 3 ([which makes `True`/`False` reserved words instead of plain names that can be reassigned](https://docs.python.org/dev/whatsnew/3.0.html#changed-syntax)). Otherwise, Python would construct that `list` over and over to perform the check, which costs more than the check itself. You can use the `dis` module to check byte code; the `list` (and Py2 `tuple`) load 7 times, then build, then subscript. – ShadowRanger Oct 07 '15 at 19:51
  • are you sure you are keeping track of all the daylight savings and leap seconds added etc? – user37203 Oct 07 '15 at 19:54
  • You could store the `list` (or Py2 `tuple`) to a variable somewhere outside the function (or as a default argument to the function) to get similar behavior if you're not able to use a `tuple` on Py3, but even then, unlikely to matter. – ShadowRanger Oct 07 '15 at 19:55
  • my approximation ignores leap seconds; it is an approximation as stated and not perfect – Tommy Oct 07 '15 at 19:55
  • @Tommy: One of your big expenses in the first version is the test `d.isoweekday() in range(1, 6)`. On Python 2, that's looking up the builtin `range` (LEGB lookup is a killer), constructing a `list`, `[1, 2, 3, 4, 5]` _every_ time that line is executed, then scanning it linearly. Changing it to the equivalent `1 <= d.isoweekday() < 6` single-handedly reduces per call cost from ~1.48 microseconds on my machine to ~1.17 microseconds, just over a 20% savings. Caching `utcfromtimestamp = datetime.uftfromtimestamp` outside the function and calling that saves another 5% or so. – ShadowRanger Oct 07 '15 at 20:14
  • Mind you, still slower than what you have (which is slightly wrong, since your `is_weekday` is testing for `is_weekend`), but the difference is largely noise given the overall inefficiencies of Python. – ShadowRanger Oct 07 '15 at 20:19
  • @Tommy I think you should mention in the question your current and target runtimes, It'll give a better idea as to what sort of optimization is necessary. – yurib Oct 07 '15 at 20:27
  • @user37203 IMO `datetime` doesn't keep track of leap seconds either. And epoch time is generally in UTC, which is DST-free. – Mark Ransom Oct 07 '15 at 20:48

2 Answers2

2

Assuming you really need the speed, the only savings to be gained (in CPython) is to reduce the amount of bytecode you're executing, and even storing to locals costs additional byte code work (even if it doesn't do much work for each byte code instruction, simply working through them has overhead). So basically, minimize intermediate storage (and therefore byte codes) by one-lining it as in your commented out code (although on really old Python, you'd need a tuple of constants for the not in check to avoid Python stupidly reconstructing a list each time):

def _epoch_to_dinfo(epoch):  
    return (epoch // 86400) % 7 not in (2, 3), (epoch % 86400) // 60

Just by one-lining, the cost per run in my Python 2.7 x86 install drops by ~23%.

You might think you'd be able to use divmod to calculate the quotient and remainder of epoch divided by 86400 at once, but the cost of looking up divmod from the builtin namespace (expensive thanks to LEGB search), calling it (substantially more expensive than syntax based calls like // and %), unpacking its results, and loading the unpacked results back from the stack mean it ends up costing substantially more than even the non-one-lined solution; unless the inputs are large enough that the actual math work done outweighs the lookup costs and function call overhead significantly (which usually means the numbers have to be large enough to invoke array based math and then some; using long in Py2, or ints that exceed the digit size, 15 or 30 bits for 32 and 64 bit systems, in Py3), divmod almost never saves time.

Similarly, the test of not in (2, 3) wins over all other approaches, not because it's logically faster, but because it simplifies to LOAD_CONST of the constant tuple and invoking COMPARE_OP for not in (after which the comparisons are done at the C layer); individual tests against 2 and 3 would load more constants, invoke more COMPARE_OPs and do conditional jumps and such in Python byte code and that's more expensive.

None of this advice applies to any interpreter besides CPython (and much of it might only apply to CPython 2.7), because it's all implementation details.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
0

You can pre-calculate all the Saturdays and Sundays and put these in a dictionary, using days since epoch as the key. Then you can do something like this:

saturdays = {d: True for d in range(2,5000,7)}  # pre-calculate
sundays = {d: True for d in range(3,5000,7)}
saturdays_and_sundays = {**saturdays, **sundays} # join dicts (Python 3.5+)

# in your function
days_since_epoch = epoch / 86400
minute_of_day = (epoch % 86400) / 60
if days_since_epoch in saturdays_and_sundays :
    return True, minute_of_day
return False, minute_of_day
Thane Plummer
  • 7,966
  • 3
  • 26
  • 30