Fast concatenation of bytes() in python3

Question

I have an array of byte-strings in python3 (it's an audio chunks). I want to make one big byte-string from it. Simple implementation is kind of slow. How to do it better?

chunks = []
while not audio.ends():
  chunks.append( bytes(audio.next_buffer()) )
  do_some_chunk_processing()

all_audio=b''
for ch in chunks:
  all_audio += ch

How to do it faster?

Are you sure that piecing together the chunks is what's taking the time? Your main `while` loop looks like it has the potential of being very slow. — Mark Ransom, Mar 04 '21 at 13:46

Amin Pial · Answer 1 · 2023-05-28T08:06:54.687

Use bytearray()

from time import time

c = b'\x02\x03\x05\x07' * 500 # test data

# Method-1 with bytes-string

bytes_string = b''

st = time()
for _ in range(10**4):
    bytes_string += c

print("string concat -> took {} sec".format(time()-st))

# Method-2 with bytes-array

bytes_arr = bytearray()

st = time()
for _ in range(10**4):
    bytes_arr.extend(c)
# convert byte_arr to bytes_string via
bytes_string = bytes(bytes_arr)

print("bytearray extend/concat -> took {} sec".format(time()-st))

benchmark in my Win10|Corei7-7th Gen shows:

string concat -> took 67.27699875831604 sec
bytearray extend/concat -> took 0.08975911140441895 sec

the code is pretty self-explanatory. instead of using string+=next_block, use bytearray.extend(next_block). After building bytearray you can use bytes(bytearray) to get the bytes-string.

Finally a fast solution, I was adding >50,000 chunks of bytes on the fly and I got a 140x speed up by using `bytearray`. — TheLizzard, May 15 '23 at 19:46

Wander Nauta · Accepted Answer · 2021-03-04T13:45:26.350

4

One approach you could try and measure would be to use bytes.join:

all_audio = b''.join(chunks)

The reason this might be faster is that this does a pre-pass over the chunks to find out how big all_audio needs to be, allocates exactly the right size once, then concatenates it in one go.

Reference

edited Mar 04 '21 at 13:45

answered Mar 04 '21 at 13:40

Wander Nauta

18,832
1
45
62

score 0 · Answer 3 · answered Mar 04 '21 at 13:47

0

One approach is to use fstring

all_audio = b''
for ch in chunks:
        all_audio = f'{all_audio}{ch}'

Which seems to be faster for small strings, according to this comparison.

answered Mar 04 '21 at 13:47

A. Bohyn

64
9

Fast concatenation of bytes() in python3

3 Answers3