0

What I want to convert is something like this

a = [ 0, 10, 3, 2, 0, 2 ]

def covert_to_boolean(a)
     ...
    return a_converted

a_coverted = [ 0, 1, 1, 1, 0, 1]

what would be the easiest way to convert like this?

seralouk
  • 30,938
  • 9
  • 118
  • 133
Tae
  • 63
  • 4

7 Answers7

7

To convert to true Booleans, you could just use:

def covert_to_boolean(a)
    return [bool(x) for x in a]

This returns

[False, True, True, True, False, True]

If you'd prefer them as 0s and 1s, then:

    return [int(bool(x)) for x in a]

Would return:

[0, 1, 1, 1, 0, 1]
David Buck
  • 3,752
  • 35
  • 31
  • 35
4

Not actually suggesting this unless the code is the hottest code in your program, but there are ways to improve on:

def covert_to_boolean(a)
    return [bool(x) for x in a]
    # Or the straightforward way of converting back to 1/0
    return [int(bool(x)) for x in a]

First off, if a is large enough, since int/bool are built-ins implemented in C, you can use map to remove byte code interpreter overhead:

def covert_to_boolean(a)
    return [*map(bool, a)]
    # Or converting back to 1/0
    return [*map(int, map(bool, a))]

Another savings can come from not using the bool constructor (C constructor calls have unavoidable overhead on CPython, even when the result doesn't actually "construct" anything), and replacing it with operator.truth (a plain function taking exactly one argument, which CPython heavily optimizes) reduces overhead significantly, and using it can reduce overhead by 40%:

>>> import random
>>> from operator import truth
>>> a = random.choices([*[0] * 100, *range(1, 101)], k=1000)
>>> %%timeit -r5
... [bool(x) for x in a]
...
...
248 µs ± 7.82 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %%timeit -r5
... [*map(bool, a)]
...
...
140 µs ± 2.5 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5
... [*map(truth, a)]
...
...
81.3 µs ± 3.91 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)

map(bool improved on the list comprehension by about 45%, and was in turn beat by map(truth by 40% (map(truth took almost exactly one third the time of the list comprehension).

If the result must be an int, we could expand it to [*map(int, map(truth, a))], but again, int is a constructor, and even though it returns singleton values (CPython caches single copies of -5 through 256 as an implementation detail), it still pays constructor overhead (worse, because it can take keyword arguments). There is no equivalent "convert to true int" function like bool has operator.truth, but you can cheat your way into one by "adding to 0":

>>> %%timeit -r5
... [int(bool(x)) for x in a]
...
...
585 µs ± 65.2 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)

>>> %%timeit -r5
... [*map(int, map(bool, a))]
...
...
363 µs ± 58.6 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)

>>> %%timeit -r5
... [*map((0).__add__, map(truth, a))]
...
...
168 µs ± 2.2 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)

(0).__add__ just takes advantage of the fact that adding a bool to 0 produces either 0 or 1, and __add__ has far lower overhead than a constructor; in this case, the switch from list comprehension to map (even nested map) saved nearly 40%, switching from int/bool to (0).__add__/truth saved nearly 55% off what remained, for a total reduction in runtime of over 70%.

Again, to be clear, don't do this unless:

  1. You've profiled, and converting really is the critical path in your code, speed-wise, and
  2. The inputs aren't too small (if a were only a five elements, the setup overhead for calling map would outweigh the tiny savings from avoiding byte code per loop)

but when it comes up, it's good to know about. bool is one of the slowest things in Python, in terms of overhead:productive work ratio; int of already int-like things is similarly bad.

There is one last thing to check though. Maybe pushing things to syntax, avoiding function calls, might save more. As it happens, the answer is "it does, for one of them":

>>> %%timeit -r5
... [not not x for x in a]  # Worse than map
...
...
122 µs ± 6.6 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5
... [0 + (not not x) for x in a]  # BETTER than map!!!
...
...
158 µs ± 22.4 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)

>>> %%timeit -r5
...: [0 + x for x in map(truth, a)]  # Somehow not the best of both worlds...
...:
...:
177 µs ± 5.77 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)

While [not not x for x in a] lost to [*map(truth, a)], [0 + (not not x) for x in a] actually beat [*map((0).__add__, map(truth, a))] (as it happens, there is some overhead in (0).__add__ being invoked through a wrapper around the tp_add slot which can be avoided by actually using + at the Python layer). Mixing the best of each solution (map(truth with 0 + in list comp) didn't actually benefit us though (readding the bytecode overhead was roughly a fixed cost, and not not beats even operator.truth). Point is, none of this is worth it unless you actually need it, and performance can be unintuitive. I had code that needed it, once upon a time, so you benefit from my testing.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
2

You can use the and operator in a list comprehension to keep the code both fast and readable:

def covert_to_boolean(a)
    return [i and 1 for i in a]

This approach is faster than @ShadowRanger's fastest approach, as demonstrated here: https://repl.it/@blhsing/NeglectedClientsideLanserver

blhsing
  • 91,368
  • 6
  • 71
  • 106
  • Ah, nice. Takes advantage of the values already being `int`. My approach is more general (works with any input, not just `int`), but yeah, it does needlessly convert to `bool` and back when half the values are already correct. When the desired result is `True`/`False`, `[*map(truth, a)]` beats this, but this beats my solutions that end up as `1`/`0`. – ShadowRanger Nov 22 '19 at 01:17
  • Note that your test case is for a fairly small input; if you want asymptotic performance to avoid overweighting setup costs, I'd suggest going with a much longer `a` (if you're only doing small `a` anyway, the performance doesn't matter much). I went with 1000 elements for that reason. – ShadowRanger Nov 22 '19 at 01:23
0

2 Half line solutions:

def covert_to_boolean(a):
    return [1 if i !=0 else 0 for i in a]
# [0, 1, 1, 1, 0, 1]

#OR
def covert_to_boolean(a):
    return [bool(i)*1 for i in a]
# [0, 1, 1, 1, 0, 1]
seralouk
  • 30,938
  • 9
  • 118
  • 133
  • Why bother with `True*1`/`False*1` when you can just do `1`/`0`: `[1 if i !=0 else 0 for i in a]`. The `*1` trick is useful if you had to have a boolean (e.g. `[(i != 0)*1 for i in a]`), though `+0` uses a cheaper fake operation, but as long as you're using ternary selection, it's unnecessary. – ShadowRanger Nov 22 '19 at 00:34
  • true initially I thought the OP wanted boolean output so I built my answer that way, then I saw the 0/1 and to fix it fast I did BOOL*1/0 – seralouk Nov 22 '19 at 07:21
0

Not sure if you wanted b or c, so here's both

>>> a = [ 0, 10, 3, 2, 0, 2 ]
>>> b = [bool(i) for i in a]
>>> b
[False, True, True, True, False, True]
>>> c = [int(bool(i)) for i in a]
>>> c
[0, 1, 1, 1, 0, 1]
Hymns For Disco
  • 7,530
  • 2
  • 17
  • 33
0

Never mind the lapses in terminology; here is a solution using list comprehension that you can study (assuming you are a student):

a=[2,0,12,45,0,0,99]
b=[1 if i != 0 else 0 for i in a]
print b
[1, 0, 1, 1, 0, 0, 1]
gk_2000
  • 194
  • 3
  • 16
0

If you are trying to convert your values to 0 and 1, I think the most elegant way would be:

a_converted = [1 if e else 0 for e in a]

where you basically check if e, meaning e is non-zero and assign 1, vs it being zero and assign 0, for each e in a.

FatihAkici
  • 4,679
  • 2
  • 31
  • 48