Generally, when measuring (and comparing) performance, it's very important to measure exactly the part of interest.
Let's assume we want to measure and compare performance for 3 things (a, b, and c). Now, let's assign to each a (theoretical) performance score:
Like in time performance tests, we'll start from the premise that the lower the score, the better (performance).
In this case, the above scores (1, 4, 8 - also called absolute scores) are pretty relevant and meaningful for everyone.
But let's also calculate the relative scores: one way is to take the variant that performs worst (c), and express all as percents relative to it:
- a:
1 / 8
= 0.125
- b:
4 / 8
= 0.500
- c:
8 / 8
= 1.000
So far so good. The absolute and relative scores are visibly proportional.
But, in the real world in some cases measuring exactly some specific item is hard (almost impossible if taking into account factors from outside the program). In those cases, the item + some overhead are measured together and the final score is the sum (not necessarily the arithmetical addition) of the item and overhead scores. Te goal here is to have the overhead as insignificant as possible to reduce its impact (weight) on the overall score.
For example, let's take a huge overhead (call it d) with the score:
The above measurements (including the overhead):
- a (a + d): 3
- a (b + d): 6
- c (c + d): 10
Things look a bit different. Now, the relative scores:
- a:
(1 + 2) / (8 + 2)
= 0.333...
- b:
(4 + 2) / (8 + 2)
= 0.600
- c:
(8 + 2) / (8 + 2)
= 1.000
As seen, a's relative score is almost triple (actually it's 2.(6) times higher than) its previous value (without overhead).
Same thing is happening in your case. The 2 ** 12
calculation is an overhead. I'm not saying that the measurements are wrong, but they are not accurate either. What is for sure is that if one item performs better than another without overhead, it will also perform better with overhead, but as I said, the comparisons between them will yield incorrect results.
I modified your code and added 3 more functions, where I simply got rid of the exponentiation.
code00.py:
#!/usr/bin/env python
import sys
import timeit
import random
def f0(mode, x, y): # the most "explicit" solution, best for readability, ... but uses a str comparison
if mode == 'mode1_method_foo':
return 0 # IRL, many lines here
elif mode == 'mode2_method_bar':
return x ** 12 # IRL, many lines here too
def f1(mode, x, y): # with int comparison
if mode == 1:
return 0
elif mode == 2:
return x ** 12
def f2(mode, x, y): # with bool
if mode:
return 0
else:
return x ** 12
# The modified versions
def f3(mode, x, y): # the most "explicit" solution, best for readability, ... but uses a str comparison
if mode == 'mode1_method_foo':
return 0 # IRL, many lines here
elif mode == 'mode2_method_bar':
return 1 # IRL, many lines here too
def f4(mode, x, y): # with int comparison
if mode == 1:
return 0
elif mode == 2:
return 1
def f5(mode, x, y): # with bool
if mode:
return 0
else:
return 1
x = random.random()
y = random.random()
args = ["mode2_method_bar", 2, False]
def main(*argv):
results = {}
funcs = [
f0,
f1,
f2,
f3,
f4,
f5,
]
print("Testing functions")
for i, func in enumerate(funcs):
t = timeit.timeit(stmt="func(arg, x, y)", setup="from __main__ import {0:s} as func, x, y, args;arg=args[int(func.__name__[-1]) % 3]".format(func.__name__), number=10000000)
results.setdefault(i // 3, []).append((func.__name__, t))
print(" Done with {0:s}".format(func.__name__))
print("\n Functions absolute scores (seconds)")
for k in results:
for result in results[k]:
print(" {0:s}: {1:.6f}".format(*result))
print("\n Functions relative scores (percents - compared to the variant that took longest)")
for k, v in results.items():
print(" Batch {0:d}".format(k))
longest = max(v, key=lambda x: x[-1])[-1]
for result in v:
print(" {0:s}: {1:.6f}".format(result[0], result[1] / longest))
if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main(*sys.argv[1:])
print("\nDone.")
Output:
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q061250859]> "e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" code00.py
Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32
Testing functions
Done with f0
Done with f1
Done with f2
Done with f3
Done with f4
Done with f5
Functions absolute scores (seconds)
f0: 3.833778
f1: 3.591715
f2: 3.083926
f3: 1.671274
f4: 1.467826
f5: 1.118479
Functions relative scores (percents - compared to the variant that took longest)
Batch 0
f0: 1.000000
f1: 0.936861
f2: 0.804409
Batch 1
f3: 1.000000
f4: 0.878268
f5: 0.669238
Done.
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q061250859]> "e:\Work\Dev\VEnvs\py_pc032_02.07.17_test0\Scripts\python.exe" code00.py
Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 20:49:36) [MSC v.1500 32 bit (Intel)] 32bit on win32
Testing functions
Done with f0
Done with f1
Done with f2
Done with f3
Done with f4
Done with f5
Functions absolute scores (seconds)
f0: 3.947840
f1: 3.613213
f2: 3.384385
f3: 1.898074
f4: 1.604591
f5: 1.315465
Functions relative scores (percents - compared to the variant that took longest)
Batch 0
f0: 1.000000
f1: 0.915238
f2: 0.857275
Batch 1
f3: 1.000000
f4: 0.845379
f5: 0.693053
Done.
Notes:
- I ran the program with both Python 2 and Python 3 (and as seen the latter has some speed improvements)
- The key point here are the relative scores drops from Batch 0 (the original functions) to Batch 1 (the original functions without 2 ** 12 - meaning less overhead)
- The results might vary between runs (due to external factors: e.g. OS scheduling the process to a CPU), so to have an idea that's as close as possible to reality multiple tests must be run
There's one massive overhead in your code: the fact that the code you want to measure is located in functions. Calling the function, passing the arguments and return value are quite costly compared to the comparison per se, and I'd bet that by timing just comparisons, the relative (and also absolute) scores would be quite different.