Fastest Way to convert a Binary String to Binary Array (Array of 1 and 0)

Question

I am trying to find the fastest possible way to convert a binary string to an array of integer 0 and 1. I am currently using python 3.8, and have the following two functions to obtain such array:

import numpy as np
from typing import Literal, Sequence
def string_to_array(Bin_String):
    Bin_array=[int(Bin_String[i],2) for i in range(len(Bin_String))]
    return Bin_array

def string_to_array_LtSq(string: Sequence[Literal['0', '1']]) -> np.ndarray:
    return np.array([int(c) for c in string])

For a string of length 1024, string_to_array_LtSq function takes 20 micro-seconds less than the other (average 370 micro-seconds) though I don't understand why it is faster since both are using int function.

But this is an important part of the code, so is there a faster way in python?

Also, is it possible to do faster in any other language (for example c)? I might switch to that language.

Thanks.

Convert Bitstring (String of 1 and 0s) to numpy array

@KellyBundy would you please tell me what causes the confusion? — Michael, Jul 03 '22 at 17:22
The fact that you already accept two different types (Python list and NumPy array) as "array". What else might you accept? It's unclear. — Kelly Bundy, Jul 03 '22 at 17:35
@KellyBundy got it, I mean anything that will act like array! :) I just need to use the integer value of the corresponding index of the string, but I understand the confusion. note that `(string_to_array(Bin_String)==string_to_array_LtSq).all()` gives `true`. — Michael, Jul 03 '22 at 17:41
"act like array" is equally unclear. But the rest of that comment helps a bit. — Kelly Bundy, Jul 03 '22 at 17:47
@KellyBundy Let, `A` is a string, `A[i]` gives `0` or `1` as string, but I want `A[i]` as integer, yes i have not clarified it properly in the post. — Michael, Jul 03 '22 at 17:56
So `bytes` would also be acceptable, i.e., I don't need to use `bytearray`? — Kelly Bundy, Jul 03 '22 at 17:59

Andrej Kesely · Answer 1 · 2022-07-03T17:26:29.697

3

Try:

s = '0011'

print(np.frombuffer(s.encode("ascii"), dtype="u1") - 48)

Benchmark:

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


def f2():
    return np.array([int(c) for c in s])


def f3():
    return list(map(int, s))


def f4():
    return [int(c) for c in s]


t1 = timeit(f1, number=1_000)
t2 = timeit(f2, number=1_000)
t3 = timeit(f3, number=1_000)
t4 = timeit(f4, number=1_000)

print(t1)
print(t2)
print(t3)
print(t4)

Prints:

0.00223864201689139
0.18963027599966154
0.10751374304527417
0.13433810899732634

^{EDIT: added functions which creates only python list (instead of np.array)}

edited Jul 03 '22 at 17:26

answered Jul 03 '22 at 17:12

Andrej Kesely

168,389
15
48
91

I am using `perf_counter`, is there any issue with `perf_counter`? – Michael Jul 03 '22 at 17:18
1

@Michael This uses `perf_counter` as well. – Kelly Bundy Jul 03 '22 at 17:21
@Michael I've added functions which creates python lists for comparison. – Andrej Kesely Jul 03 '22 at 17:26
You could create a list faster with `f1().tolist()` (although it seems still slower than my way via `bytes`). – Kelly Bundy Jul 03 '22 at 17:57

score 2 · Accepted Answer · answered Jul 03 '22 at 17:46

2

bytearray appears to be even faster than Andrej's NumPy solution. And bytes can be used for a fast list solution. Times with 1024 bits (only showing the first 5):

f1   2.7 μs  [1 0 1 1 1]
f2   2.0 μs  bytearray(b'\x01\x00\x01\x01\x01')
f3   7.6 μs  [1, 0, 1, 1, 1]

Code based on Andrej's (Try it online!):

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


table = bytearray.maketrans(b'01', b'\x00\x01')

def f2():
    return bytearray(s, "ascii").translate(table)


def f3():
    return [*s.encode().translate(table)]


for _ in range(3):
    for f in f1, f2, f3:
        t = timeit(f, number=1_000)
        t = '%5.1f μs ' % (t * 1e3)
        print(f.__name__, t, f()[:5])
    print()

answered Jul 03 '22 at 17:46

Kelly Bundy

23,480
7
29
65

Nice solution. If OP doesn't need list or np.array this should be the fastest. On my computer `Python 3.9.7`/`AMD 3700X` yours solution is even faster: `f1 2.2 μs f2 0.9 μs f3 10.6 μs` – Andrej Kesely Jul 03 '22 at 18:00
@AndrejKesely Yeah, it's a bit odd to do this without really knowing what they need. Especially what they're going to do with it. The different types will also lead to different speeds in their subsequent usage. I feel like that should be taken into consideration as well . – Kelly Bundy Jul 03 '22 at 18:03
@AndrejKesely I guess we have everything in this post! but now I am thinking what will be the problem, I need to choose the solution which will not cause any problem in the future, by the way Kelly table should be inside the function..that takes some time, but still yours is faster. – Michael Jul 03 '22 at 18:13
2

@Michael I disagree. The table can and should be created once and (re)used many times. It should not be in the function. – Kelly Bundy Jul 03 '22 at 18:35
@AndrejKesely I mean, as an extreme, we could write a class `Array` whose constructor simply stores the string and whose `__getitem__` does the conversion from character to `int` on the fly. Then I could brag about converting to array in 0.3 μs, but the later usage would suffer. So just benchmarking the conversion to "array" seems rather meaningless. – Kelly Bundy Jul 03 '22 at 18:45

Fastest Way to convert a Binary String to Binary Array (Array of 1 and 0)

2 Answers2