What I want to convert is something like this
a = [ 0, 10, 3, 2, 0, 2 ]
def covert_to_boolean(a)
...
return a_converted
a_coverted = [ 0, 1, 1, 1, 0, 1]
what would be the easiest way to convert like this?
What I want to convert is something like this
a = [ 0, 10, 3, 2, 0, 2 ]
def covert_to_boolean(a)
...
return a_converted
a_coverted = [ 0, 1, 1, 1, 0, 1]
what would be the easiest way to convert like this?
To convert to true Booleans, you could just use:
def covert_to_boolean(a)
return [bool(x) for x in a]
This returns
[False, True, True, True, False, True]
If you'd prefer them as 0s and 1s, then:
return [int(bool(x)) for x in a]
Would return:
[0, 1, 1, 1, 0, 1]
Not actually suggesting this unless the code is the hottest code in your program, but there are ways to improve on:
def covert_to_boolean(a)
return [bool(x) for x in a]
# Or the straightforward way of converting back to 1/0
return [int(bool(x)) for x in a]
First off, if a
is large enough, since int
/bool
are built-ins implemented in C, you can use map
to remove byte code interpreter overhead:
def covert_to_boolean(a)
return [*map(bool, a)]
# Or converting back to 1/0
return [*map(int, map(bool, a))]
Another savings can come from not using the bool
constructor (C constructor calls have unavoidable overhead on CPython, even when the result doesn't actually "construct" anything), and replacing it with operator.truth
(a plain function taking exactly one argument, which CPython heavily optimizes) reduces overhead significantly, and using it can reduce overhead by 40%:
>>> import random
>>> from operator import truth
>>> a = random.choices([*[0] * 100, *range(1, 101)], k=1000)
>>> %%timeit -r5
... [bool(x) for x in a]
...
...
248 µs ± 7.82 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %%timeit -r5
... [*map(bool, a)]
...
...
140 µs ± 2.5 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
>>> %%timeit -r5
... [*map(truth, a)]
...
...
81.3 µs ± 3.91 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
map(bool
improved on the list comprehension by about 45%, and was in turn beat by map(truth
by 40% (map(truth
took almost exactly one third the time of the list comprehension).
If the result must be an int
, we could expand it to [*map(int, map(truth, a))]
, but again, int
is a constructor, and even though it returns singleton values (CPython caches single copies of -5 through 256 as an implementation detail), it still pays constructor overhead (worse, because it can take keyword arguments). There is no equivalent "convert to true int
" function like bool
has operator.truth
, but you can cheat your way into one by "adding to 0
":
>>> %%timeit -r5
... [int(bool(x)) for x in a]
...
...
585 µs ± 65.2 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %%timeit -r5
... [*map(int, map(bool, a))]
...
...
363 µs ± 58.6 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %%timeit -r5
... [*map((0).__add__, map(truth, a))]
...
...
168 µs ± 2.2 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
(0).__add__
just takes advantage of the fact that adding a bool
to 0
produces either 0
or 1
, and __add__
has far lower overhead than a constructor; in this case, the switch from list comprehension to map
(even nested map
) saved nearly 40%, switching from int
/bool
to (0).__add__
/truth
saved nearly 55% off what remained, for a total reduction in runtime of over 70%.
Again, to be clear, don't do this unless:
a
were only a five elements, the setup overhead for calling map
would outweigh the tiny savings from avoiding byte code per loop)but when it comes up, it's good to know about. bool
is one of the slowest things in Python, in terms of overhead:productive work ratio; int
of already int
-like things is similarly bad.
There is one last thing to check though. Maybe pushing things to syntax, avoiding function calls, might save more. As it happens, the answer is "it does, for one of them":
>>> %%timeit -r5
... [not not x for x in a] # Worse than map
...
...
122 µs ± 6.6 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
>>> %%timeit -r5
... [0 + (not not x) for x in a] # BETTER than map!!!
...
...
158 µs ± 22.4 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
>>> %%timeit -r5
...: [0 + x for x in map(truth, a)] # Somehow not the best of both worlds...
...:
...:
177 µs ± 5.77 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
While [not not x for x in a]
lost to [*map(truth, a)]
, [0 + (not not x) for x in a]
actually beat [*map((0).__add__, map(truth, a))]
(as it happens, there is some overhead in (0).__add__
being invoked through a wrapper around the tp_add
slot which can be avoided by actually using +
at the Python layer). Mixing the best of each solution (map(truth
with 0 +
in list comp) didn't actually benefit us though (readding the bytecode overhead was roughly a fixed cost, and not not
beats even operator.truth
). Point is, none of this is worth it unless you actually need it, and performance can be unintuitive. I had code that needed it, once upon a time, so you benefit from my testing.
You can use the and
operator in a list comprehension to keep the code both fast and readable:
def covert_to_boolean(a)
return [i and 1 for i in a]
This approach is faster than @ShadowRanger's fastest approach, as demonstrated here: https://repl.it/@blhsing/NeglectedClientsideLanserver
def covert_to_boolean(a):
return [1 if i !=0 else 0 for i in a]
# [0, 1, 1, 1, 0, 1]
#OR
def covert_to_boolean(a):
return [bool(i)*1 for i in a]
# [0, 1, 1, 1, 0, 1]
Not sure if you wanted b or c, so here's both
>>> a = [ 0, 10, 3, 2, 0, 2 ]
>>> b = [bool(i) for i in a]
>>> b
[False, True, True, True, False, True]
>>> c = [int(bool(i)) for i in a]
>>> c
[0, 1, 1, 1, 0, 1]
Never mind the lapses in terminology; here is a solution using list comprehension that you can study (assuming you are a student):
a=[2,0,12,45,0,0,99]
b=[1 if i != 0 else 0 for i in a]
print b
[1, 0, 1, 1, 0, 0, 1]
If you are trying to convert your values to 0 and 1, I think the most elegant way would be:
a_converted = [1 if e else 0 for e in a]
where you basically check if e
, meaning e is non-zero and assign 1, vs it being zero and assign 0, for each e
in a
.