I'm importing point cloud data into MonetDB using MonetDBLite and Python. Since some processing steps are pretty CPU-intensive, I'm parallelizing the processing to operate on two cores. At the end of the preprocessing, the data is loaded into MonetDB from a Pandas data frame.
As long as the Python process is active, the size of the database on disk increases with each insert. But as soon as the process/worker terminates, disk sizes shrinks back to 1.5MB.
How can I make the changes persistent?
This is a rough simplifaction of the workflow:
def process:
# preprocessing...
x, y = numpy.meshgrid(numpy.arange(1000), numpy.arange(1000))
z = numpy.random.rand(1000000)
data = pandas.DataFrame({"x": x, "y": y, "z": z})
conn = monetdblite.connectclient()
monetdblite.insert('points', data, client=conn)
del conn
datalist = [...]
monetdblite.init("./database/")
with Pool(processes=2, maxtasksperchild=1) as p:
p.map(process, datalist, 1)
monetdblite.shutdown()