I'm working with histograms presented as pandas Series and representing the realizations of random variables from an observation set. I'm looking for an efficient way to store and read them back.
The histogram's bins are the index of the Series. For example :
histogram1 :
(-1.3747106810983318, 3.529160051186781] 0.012520
(3.529160051186781, 8.433030783471894] 0.013830
(8.433030783471894, 13.336901515757006] 0.016495
(13.336901515757006, 18.24077224804212] 0.007194
(18.24077224804212, 23.144642980327234] 0.041667
(23.144642980327234, 28.048513712612344] 0.000000
I would like to store several of these histograms in a single csv file (one file for each set of random variables, one file would store ~100 histograms), and read them back later exactly as they were before storing (each histogram from the file as a single Series, all values as floats).
How can I do this ? Since speed matters, is there a more efficient way than csv files ?
Therefore, when a new realization of a variable comes in, I would retrieve it's histogram from the corresponding file and assess the bin that it "falls in". Something like this :
# Not very elegant
for bin in histogram1.index:
if 1.0232545 in bin:
print("It's in!")
print(histogram1.loc[bin])
Thanks !