I have 40 data sets, each about 115MB in size, and I would like to plot them all together on the same plot in log log scale.
# make example data
import numpy as np
data_x = []
data_y = []
for _ in range(40):
x, y = np.random.random(size = (2, int(7e6))) # 7e6 chosen to make about 115MB size
data_x.append(x)
data_y.append(y)
del x, y
# now show the size of one set in MB
print((data_x[0].nbytes + data_y[0].nbytes)/1e6, 'MB')
# 112.0 MB
My computer has about 30GB of available ram, so I fully expect the 40*112MB = 4.5GB
to fit.
I would like to make an overlaid log log plot of every data set:
import matplotlib.pyplot as plt
for x,y in zip(data_x, data_y):
plt.loglog(x, y)
plt.show()
But the memory overhead is too large.
I'd prefer not to downsample the data. Is there a way I might reduce the memory overhead in order to plot this 4.5GB
of data ?
I would prefer to keep the for loop as I need to modify the point style and color of each plot in it, so to concatenate the datasets is unfavorable.
The most similar question I could find is here, but this differs in that the loop is used to create distinct plots, instead of to add to the same plot, so adding a plt.clf()
command into the loop does not help me.