0

I have a dataframe containing millions of floats. I want to turn them into bytes and join them in a single line. Iterating over each of them is kinda slow. Is there a way to speed this up?

import struct
import numpy as np


# list of floats [197496.84375, 177091.28125, 140972.3125, 120965.9140625, ...]
# 5M - 20M floats in total
data = df.to_numpy().flatten().tolist()

# too slow
dataline = b''.join([struct.pack('>f', event) for event in data])

I tried another approach, but apart from being slow, it also produces a different result

import struct
import numpy as np


def myfunc(event):
    return struct.pack('>f', event)


data = df.to_numpy().flatten()

myfunc_vec = np.vectorize(myfunc)
result = myfunc_vec(data)
dataline = b''.join(result)

UPD: found an example here Fastest way to pack a list of floats into bytes in python, but it doesn't allow me to specify endianess. Putting this '%s>f' instead of '%sf' results in an error: error: bad char in struct format

import random
import struct


floatlist = [random.random() for _ in range(10**5)]
buf = struct.pack('%sf' % len(floatlist), *floatlist)
Superbman
  • 787
  • 1
  • 8
  • 24

0 Answers0