Time comparison of @Cedric Poulet's (all kudos to him, see his answer) solution (with added array splitting so it returns the result as desired) with another numpy
approach I thought about at first (create array of zeros and insert data in-place):
import time
import numpy as np
def time_measure(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
stop = time.time()
print(f"Elapsed time: {stop-start}")
return result
return wrapper
@time_measure
def pad_and_chunk(array, chunk_size: int):
padded_array = np.zeros(len(array) + (chunk_size - len(array) % chunk_size))
padded_array[: len(array)] = array
return np.split(padded_array, len(padded_array) / chunk_size)
@time_measure
def resize(array, chunk_size: int):
array.resize(len(array) + (chunk_size - len(array) % chunk_size), refcheck=False)
return np.split(array, len(array) / chunk_size)
@time_measure
def makechunk4(l, chunk):
l.resize((math.ceil(l.shape[0] / chunk), chunk), refcheck=False)
return l.reshape(chunk, -1)
if __name__ == "__main__":
array = np.random.rand(1_000_000)
ret = pad_and_chunk(array, 3)
ret = resize(array, 3)
ret = makechunk4(array, 3)
EDIT-EDIT
Gathering all possible answers it is indeed the case that np.split
is horribly slow when compared to reshape.
Elapsed time: 0.3276541233062744
Elapsed time: 0.3169224262237549
Elapsed time: 1.8835067749023438e-05
Way of padding data is not essential, it's the split taking up most of the time.