1

I am trying to find the fastest possible way to convert a binary string to an array of integer 0 and 1. I am currently using python 3.8, and have the following two functions to obtain such array:

import numpy as np
from typing import Literal, Sequence
def string_to_array(Bin_String):
    Bin_array=[int(Bin_String[i],2) for i in range(len(Bin_String))]
    return Bin_array

def string_to_array_LtSq(string: Sequence[Literal['0', '1']]) -> np.ndarray:
    return np.array([int(c) for c in string])

For a string of length 1024, string_to_array_LtSq function takes 20 micro-seconds less than the other (average 370 micro-seconds) though I don't understand why it is faster since both are using int function.

But this is an important part of the code, so is there a faster way in python?

Also, is it possible to do faster in any other language (for example c)? I might switch to that language.

Thanks.

Related Post:

  1. Convert Bitstring (String of 1 and 0s) to numpy array
Michael
  • 191
  • 3
  • 16

2 Answers2

3

Try:

s = '0011'

print(np.frombuffer(s.encode("ascii"), dtype="u1") - 48)

Benchmark:

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


def f2():
    return np.array([int(c) for c in s])


def f3():
    return list(map(int, s))


def f4():
    return [int(c) for c in s]


t1 = timeit(f1, number=1_000)
t2 = timeit(f2, number=1_000)
t3 = timeit(f3, number=1_000)
t4 = timeit(f4, number=1_000)

print(t1)
print(t2)
print(t3)
print(t4)

Prints:

0.00223864201689139
0.18963027599966154
0.10751374304527417
0.13433810899732634

EDIT: added functions which creates only python list (instead of np.array)

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
2

bytearray appears to be even faster than Andrej's NumPy solution. And bytes can be used for a fast list solution. Times with 1024 bits (only showing the first 5):

f1   2.7 μs  [1 0 1 1 1]
f2   2.0 μs  bytearray(b'\x01\x00\x01\x01\x01')
f3   7.6 μs  [1, 0, 1, 1, 1]

Code based on Andrej's (Try it online!):

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


table = bytearray.maketrans(b'01', b'\x00\x01')

def f2():
    return bytearray(s, "ascii").translate(table)


def f3():
    return [*s.encode().translate(table)]


for _ in range(3):
    for f in f1, f2, f3:
        t = timeit(f, number=1_000)
        t = '%5.1f μs ' % (t * 1e3)
        print(f.__name__, t, f()[:5])
    print()
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • Nice solution. If OP doesn't need list or np.array this should be the fastest. On my computer `Python 3.9.7`/`AMD 3700X` yours solution is even faster: `f1 2.2 μs f2 0.9 μs f3 10.6 μs` – Andrej Kesely Jul 03 '22 at 18:00
  • @AndrejKesely Yeah, it's a bit odd to do this without really knowing what they need. Especially what they're going to do with it. The different types will also lead to different speeds in their subsequent usage. I feel like that should be taken into consideration as well . – Kelly Bundy Jul 03 '22 at 18:03
  • @AndrejKesely I guess we have everything in this post! but now I am thinking what will be the problem, I need to choose the solution which will not cause any problem in the future, by the way Kelly table should be inside the function..that takes some time, but still yours is faster. – Michael Jul 03 '22 at 18:13
  • 2
    @Michael I disagree. The table can and should be created once and (re)used many times. It should not be in the function. – Kelly Bundy Jul 03 '22 at 18:35
  • @AndrejKesely I mean, as an extreme, we could write a class `Array` whose constructor simply stores the string and whose `__getitem__` does the conversion from character to `int` on the fly. Then I could brag about converting to array in 0.3 μs, but the later usage would suffer. So just benchmarking the conversion to "array" seems rather meaningless. – Kelly Bundy Jul 03 '22 at 18:45