2

I have an array of byte-strings in python3 (it's an audio chunks). I want to make one big byte-string from it. Simple implementation is kind of slow. How to do it better?

chunks = []
while not audio.ends():
  chunks.append( bytes(audio.next_buffer()) )
  do_some_chunk_processing()

all_audio=b''
for ch in chunks:
  all_audio += ch

How to do it faster?

al.zatv
  • 173
  • 1
  • 13

3 Answers3

5

Use bytearray()

from time import time

c = b'\x02\x03\x05\x07' * 500 # test data

# Method-1 with bytes-string

bytes_string = b''

st = time()
for _ in range(10**4):
    bytes_string += c

print("string concat -> took {} sec".format(time()-st))

# Method-2 with bytes-array

bytes_arr = bytearray()

st = time()
for _ in range(10**4):
    bytes_arr.extend(c)
# convert byte_arr to bytes_string via
bytes_string = bytes(bytes_arr)

print("bytearray extend/concat -> took {} sec".format(time()-st))

benchmark in my Win10|Corei7-7th Gen shows:

string concat -> took 67.27699875831604 sec
bytearray extend/concat -> took 0.08975911140441895 sec

the code is pretty self-explanatory. instead of using string+=next_block, use bytearray.extend(next_block). After building bytearray you can use bytes(bytearray) to get the bytes-string.

Amin Pial
  • 383
  • 6
  • 12
  • 1
    Finally a fast solution, I was adding >50,000 chunks of bytes on the fly and I got a 140x speed up by using `bytearray`. – TheLizzard May 15 '23 at 19:46
4

One approach you could try and measure would be to use bytes.join:

all_audio = b''.join(chunks)

The reason this might be faster is that this does a pre-pass over the chunks to find out how big all_audio needs to be, allocates exactly the right size once, then concatenates it in one go.

Reference

Wander Nauta
  • 18,832
  • 1
  • 45
  • 62
0

One approach is to use fstring

all_audio = b''
for ch in chunks:
        all_audio = f'{all_audio}{ch}'

Which seems to be faster for small strings, according to this comparison.

A. Bohyn
  • 64
  • 9