I have got a folder of thousands of pickled one-dimensional numpy arrays, of which each array has a length of 921603 integer values (up to 3 digits each).
Like So:
folder/
|0.pkl
|1.pkl
|2.pkl
...
|5000.pkl
The goal is to convert them into a final merged.csv
file, so that each datapoint in form of the pickled numpy array represents a row in the output file.
My super inefficient approaches that I tried:
Loading the pickles and iterate through them to construct a string which is then appended to a csv file. :(
Using
numpy.savetxt()
did also not work out as smoothly as I had hoped...
The final goal is to get a merged file that acts as training data for tensorflow, so I also welcome different sparks of ideas for different and possibly optimized packaging methods of the datapoints.
I would be really happy for any small comments and ideas!