0

I am currently developing a python project where I am concerned with performance because my CPU is always using like 90-98% of its computing capacity.

So I was thinking about what could I change in my code to make it faster, and noticed that I have a string variable which always receives one of two values:

state = "ok" 
state = "notOk"

Since it only has 2 values, I tought about changing it to a boolean like:

isStateOk = True
isStateOk = False

Does it make any sense to do that? Is there a big difference, or any difference at all, in the speed of attributing a string to a variable and attributing a boolean to a variable?

I should also mention that I am using this variable in like 30 if comparisons in my code, so maybe the speed of comparison would be a benefit?

if (state == "ok) # Original comparison
if (isStateOk)    # New comparison
Chagall
  • 282
  • 3
  • 12
  • 2
    For assignment, there would be no real notable performance enhancement, but for comparison their would be a slight improvement. – Ranga Dec 12 '19 at 15:26
  • 1
    Well, you should be using a boolean here for clarity to begin with. You will get marginal performance improvements using a construct like `if is_state_ok` vs `if state == "ok"`, but 30 comparisons will take a fraction of a fraction of a second in any case, so this would be a premature optimization to begin with. If you are worried about performance you need to actually profile your code somehow. As for assignment, there is no difference – juanpa.arrivillaga Dec 12 '19 at 15:47

2 Answers2

2

Such micro-optimization probably won't fix the code, as stated in the other answer. Probably you should rather have a look at more general things (algorithms, data types/structures and the way to use them effectively in Python, eg. use map/filter/reduce/list comprehensions instead of for loops etc.).

However, concerning gory details of your question, try running the following test:

import time

def measure_time(a_function, times):
    start = time.perf_counter()
    #for i in range(times):
    a_function(times)
    end = time.perf_counter()
    print ("{0:40} {1}".format( a_function.__name__, end - start ) )

def test_strings_eq_literal(n):
    stateOk = "ok"
    stateNotOK = "notOK"
    state = stateNotOK
    for i in range(n):
        state == "ok"

def test_strings_is_literal(n):
    stateOk = "ok"
    stateNotOk = "notOK"
    state = stateNotOk
    for i in range(n):
        state is "ok"  # careful with this - it works for simple, id-like strings only
    
def test_strings_eq(n):
    stateOk = "ok"
    stateNotOK = "notOK"
    state = stateNotOK
    for i in range(n):
        state == stateOk

def test_strings_is(n):
    stateOk = "ok"
    stateNotOk = "notOK"
    state = stateNotOk
    for i in range(n):
        state is stateOk  # careful with this - it works for simple, id-like strings only

def test_bool_eq_literal(n):
    stateOk    = True
    stateNotOk = False
    state      = stateNotOk
    for i in range(n):
        stateOk == True

def test_bool_is_literal(n):
    stateOk    = True
    stateNotOk = False
    state      = stateNotOk
    for i in range(n):
        state is True
        
def test_bool_eq(n):
    stateOk    = True
    stateNotOk = False
    state      = stateNotOk
    for i in range(n):
        stateOk == stateOk

def test_bool_is(n):
    stateOk    = True
    stateNotOk = False
    state      = stateNotOk
    for i in range(n):
        state is stateOk
        
n = 100000000
measure_time( test_strings_eq_literal, n )
measure_time( test_strings_is_literal, n )
measure_time( test_strings_eq, n )
measure_time( test_strings_is, n )
measure_time( test_bool_eq_literal, n )
measure_time( test_bool_is_literal, n )
measure_time( test_bool_eq, n )
measure_time( test_bool_is, n )

I am getting:

test_strings_eq_literal                  3.6397838770062663
test_strings_is_literal                  2.9926898650010116
test_strings_eq                          3.7794520660536364
test_strings_is                          3.0217343979747966
test_bool_eq_literal                     3.4703008759533986
test_bool_is_literal                     2.836865450022742
test_bool_eq                             3.5056013059802353
test_bool_is                             2.847688327950891

what suggests that you gain most on using is instead of == (up to around 20%), even with some strings (but be careful - is works properly only for simple/short id-like strings; you have to be sure what you're doing here; even the interpreter gives a warning in one of these cases so better not to do this).

Tiny bit seems to be gained by using a literal instead of a variable in comparison (kind of expected - some layer of dereferencing must be done in addition behind the scenes), but that's not really worth the attention.

What is not quite true in the other answer (and that actually triggered me for answering) is that Boolean and simple id-like string (and only such!) are both singleton objects in Python and there is little difference in performance in cases like yours. However, it is not the case for longer strings (with spaces etc.) - such strings cannot be compared with is and comparing them with == will be a lot slower than using Boolean.

But as above - rather look at more general / basic things in your program first. Changing just this rather won't fix your problem.

t-w
  • 21
  • 3
1

That's not going to fix the program using 90-98% CPU, but technically yes, using a Boolean is better.

You can also use is instead of ==:

isStateOk = True

if isStateOk is True:
    # Do stuff

Edit: Nevermind, in https://hg.python.org/cpython/rev/01a7e66525c2/ they already made == True get converted by the Python interpreter to is True under the hood, so there is no performance difference to writing it either way.

While it is all around a good idea to use Booleans here since the purpose of Booleans is to represent ok/not ok state, it's not going to give any type of noticeable performance improvement.

hostingutilities.com
  • 8,894
  • 3
  • 41
  • 51