I have a dataframe containing millions of floats. I want to turn them into bytes and join them in a single line. Iterating over each of them is kinda slow. Is there a way to speed this up?
import struct
import numpy as np
# list of floats [197496.84375, 177091.28125, 140972.3125, 120965.9140625, ...]
# 5M - 20M floats in total
data = df.to_numpy().flatten().tolist()
# too slow
dataline = b''.join([struct.pack('>f', event) for event in data])
I tried another approach, but apart from being slow, it also produces a different result
import struct
import numpy as np
def myfunc(event):
return struct.pack('>f', event)
data = df.to_numpy().flatten()
myfunc_vec = np.vectorize(myfunc)
result = myfunc_vec(data)
dataline = b''.join(result)
UPD: found an example here Fastest way to pack a list of floats into bytes in python, but it doesn't allow me to specify endianess. Putting this '%s>f'
instead of '%sf'
results in an error:
error: bad char in struct format
import random
import struct
floatlist = [random.random() for _ in range(10**5)]
buf = struct.pack('%sf' % len(floatlist), *floatlist)