Cost of a string comparison vs. int vs. bool

Question

When defining functions, with parameters, we sometimes have to do a tradeoff between readability and speed. Here are three examples with str vs. int vs. bool comparison:

def f1(mode, x, y):          # the most "explicit" solution, best for readability, ... but uses a str comparison
    if mode == 'mode1_method_foo':
        return 0             # IRL, many lines here
    elif mode == 'mode2_method_bar':
        return x ** 12       # IRL, many lines here too

def f2(mode, x, y):          # with int comparison
    if mode == 1:  
        return 0
    elif mode == 2:
        return x ** 12

def f3(mode, x, y):          # with bool
    if mode:
        return 0
    else:
        return x ** 12

Surprisingly, the cost of this comparison seems non-negligible:

import time, random
start = time.time()
for i in range(1000*1000):
    x = random.random()
    y = random.random()
    #f1('mode2_method_bar', x, y)  # 0.760 sec
    #f2(2, x, y)                   # 0.700 sec
    f3(False, x, y)                # 0.644 sec
print(time.time() - start)

This is not a very clean way to measure the performance cost.

How to do a better measurement of the cost of using a string parameter vs. a int or bool?

Example: if my program does 10'000 such comparisons per second, how much time do I lose by using f1 instead of f3?

Context: using strings to name methods is often used in various APIs / libraries, example: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html: method='Nelder-Mead', etc. In the case of this Scipy function, there is no performance problem because the cost of string comparison is many order of magnitudes smaller than the cost of what the function actually does; still this issue might be interesting for other functions.

Not sure of the size of the strings you are referring to, but also be aware of python's [string intern optimization](https://stackoverflow.com/questions/24245324/about-the-changing-id-of-an-immutable-string) which will further trivialize string comparisons — Cory Kramer, Apr 16 '20 at 13:03
Using a bool only works if you have two modes, it's not equivalent. If you want readable but fast, name the mode numbers: `mode1_method_foo = 1`. — jonrsharpe, Apr 16 '20 at 13:03
@rdas I've seen `timeit` used in one-liners with iPython, which is sometimes not very convenient; how would you use it here in this context, with a normal script (no iPython)? — Basj, Apr 16 '20 at 13:03
@jonrsharpe That's true, good point. In fact I have 3 modes. — Basj, Apr 16 '20 at 13:05
@CoryKramer It's exaclty the thing I was wondering about. Is there some sort of internal optimization that makes `f1`, `f2`, `f3` similar at the end? That's what I thought, but finally, there still was a difference. Do you have a more precise idea in this specific case? — Basj, Apr 16 '20 at 13:06
@jonrsharpe I've seen and (used) `timeit` in one-liners with iPython, which is sometimes not very convenient; how would you use it here in this context, with a normal script (no iPython)? — Basj, Apr 16 '20 at 13:06
Yes, that's what I was responding to with https://stackoverflow.com/questions/61250859/cost-of-a-string-comparison-vs-int-vs-bool?noredirect=1#comment108357032_61250859 — jonrsharpe, Apr 16 '20 at 13:07
@jonrsharpe With `timeit.timeit('"method_2" == "method_1"', setup='', number=1000*1000)` I found that a str comparison is ~43 ns, i.e.: In order to waste 1 millisecond, you have to do 24000 such str comparisons. Do you think it's the right order of magnitude? — Basj, Apr 16 '20 at 14:10
Why `return x ** 12` instead of simply `return 1`? That adds overhead to each function, making the comparison (in the whole execution time) to weight less. — CristiFati, Apr 16 '20 at 14:24

CristiFati · Answer 1 · 2020-04-16T18:13:28.723

Generally, when measuring (and comparing) performance, it's very important to measure exactly the part of interest.

Let's assume we want to measure and compare performance for 3 things (a, b, and c). Now, let's assign to each a (theoretical) performance score:

a: 1
b: 4
c: 8

Like in time performance tests, we'll start from the premise that the lower the score, the better (performance).

In this case, the above scores (1, 4, 8 - also called absolute scores) are pretty relevant and meaningful for everyone.

But let's also calculate the relative scores: one way is to take the variant that performs worst (c), and express all as percents relative to it:

a: 1 / 8 = 0.125
b: 4 / 8 = 0.500
c: 8 / 8 = 1.000

So far so good. The absolute and relative scores are visibly proportional.

But, in the real world in some cases measuring exactly some specific item is hard (almost impossible if taking into account factors from outside the program). In those cases, the item + some overhead are measured together and the final score is the sum (not necessarily the arithmetical addition) of the item and overhead scores. Te goal here is to have the overhead as insignificant as possible to reduce its impact (weight) on the overall score.
For example, let's take a huge overhead (call it d) with the score:

d: 2

The above measurements (including the overhead):

a (a + d): 3
a (b + d): 6
c (c + d): 10

Things look a bit different. Now, the relative scores:

a: (1 + 2) / (8 + 2) = 0.333...
b: (4 + 2) / (8 + 2) = 0.600
c: (8 + 2) / (8 + 2) = 1.000

As seen, a's relative score is almost triple (actually it's 2.(6) times higher than) its previous value (without overhead).

Same thing is happening in your case. The 2 ** 12 calculation is an overhead. I'm not saying that the measurements are wrong, but they are not accurate either. What is for sure is that if one item performs better than another without overhead, it will also perform better with overhead, but as I said, the comparisons between them will yield incorrect results.

I modified your code and added 3 more functions, where I simply got rid of the exponentiation.

code00.py:

#!/usr/bin/env python

import sys
import timeit
import random


def f0(mode, x, y):          # the most "explicit" solution, best for readability, ... but uses a str comparison
    if mode == 'mode1_method_foo':
        return 0             # IRL, many lines here
    elif mode == 'mode2_method_bar':
        return x ** 12       # IRL, many lines here too


def f1(mode, x, y):          # with int comparison
    if mode == 1:  
        return 0
    elif mode == 2:
        return x ** 12


def f2(mode, x, y):          # with bool
    if mode:
        return 0
    else:
        return x ** 12


# The modified versions
def f3(mode, x, y):          # the most "explicit" solution, best for readability, ... but uses a str comparison
    if mode == 'mode1_method_foo':
        return 0             # IRL, many lines here
    elif mode == 'mode2_method_bar':
        return 1       # IRL, many lines here too


def f4(mode, x, y):          # with int comparison
    if mode == 1:  
        return 0
    elif mode == 2:
        return 1


def f5(mode, x, y):          # with bool
    if mode:
        return 0
    else:
        return 1


x = random.random()
y = random.random()
args = ["mode2_method_bar", 2, False]


def main(*argv):
    results = {}
    funcs = [
        f0,
        f1,
        f2,
        f3,
        f4,
        f5,
    ]

    print("Testing functions")
    for i, func in enumerate(funcs):
        t = timeit.timeit(stmt="func(arg, x, y)", setup="from __main__ import {0:s} as func, x, y, args;arg=args[int(func.__name__[-1]) % 3]".format(func.__name__), number=10000000)
        results.setdefault(i // 3, []).append((func.__name__, t))
        print("  Done with {0:s}".format(func.__name__))
    print("\n  Functions absolute scores (seconds)")
    for k in results:
        for result in results[k]:
            print("    {0:s}: {1:.6f}".format(*result))
    print("\n  Functions relative scores (percents - compared to the variant that took longest)")
    for k, v in results.items():
        print("    Batch {0:d}".format(k))
        longest = max(v, key=lambda x: x[-1])[-1]
        for result in v:
            print("    {0:s}: {1:.6f}".format(result[0], result[1] / longest))


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main(*sys.argv[1:])
    print("\nDone.")

Output:

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q061250859]> "e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" code00.py
Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32

Testing functions
  Done with f0
  Done with f1
  Done with f2
  Done with f3
  Done with f4
  Done with f5

  Functions absolute scores (seconds)
    f0: 3.833778
    f1: 3.591715
    f2: 3.083926
    f3: 1.671274
    f4: 1.467826
    f5: 1.118479

  Functions relative scores (percents - compared to the variant that took longest)
    Batch 0
    f0: 1.000000
    f1: 0.936861
    f2: 0.804409
    Batch 1
    f3: 1.000000
    f4: 0.878268
    f5: 0.669238

Done.

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q061250859]> "e:\Work\Dev\VEnvs\py_pc032_02.07.17_test0\Scripts\python.exe" code00.py
Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 20:49:36) [MSC v.1500 32 bit (Intel)] 32bit on win32

Testing functions
  Done with f0
  Done with f1
  Done with f2
  Done with f3
  Done with f4
  Done with f5

  Functions absolute scores (seconds)
    f0: 3.947840
    f1: 3.613213
    f2: 3.384385
    f3: 1.898074
    f4: 1.604591
    f5: 1.315465

  Functions relative scores (percents - compared to the variant that took longest)
    Batch 0
    f0: 1.000000
    f1: 0.915238
    f2: 0.857275
    Batch 1
    f3: 1.000000
    f4: 0.845379
    f5: 0.693053

Done.

Notes:

I ran the program with both Python 2 and Python 3 (and as seen the latter has some speed improvements)
The key point here are the relative scores drops from Batch 0 (the original functions) to Batch 1 (the original functions without 2 ** 12 - meaning less overhead)
The results might vary between runs (due to external factors: e.g. OS scheduling the process to a CPU), so to have an idea that's as close as possible to reality multiple tests must be run

There's one massive overhead in your code: the fact that the code you want to measure is located in functions. Calling the function, passing the arguments and return value are quite costly compared to the comparison per se, and I'd bet that by timing just comparisons, the relative (and also absolute) scores would be quite different.

Cost of a string comparison vs. int vs. bool

1 Answers1