2

I'm importing point cloud data into MonetDB using MonetDBLite and Python. Since some processing steps are pretty CPU-intensive, I'm parallelizing the processing to operate on two cores. At the end of the preprocessing, the data is loaded into MonetDB from a Pandas data frame.

As long as the Python process is active, the size of the database on disk increases with each insert. But as soon as the process/worker terminates, disk sizes shrinks back to 1.5MB.

How can I make the changes persistent?

This is a rough simplifaction of the workflow:

def process:
   # preprocessing...
   x, y = numpy.meshgrid(numpy.arange(1000), numpy.arange(1000))
   z = numpy.random.rand(1000000)
   data = pandas.DataFrame({"x": x, "y": y, "z": z})
   conn = monetdblite.connectclient()
   monetdblite.insert('points', data, client=conn)
   del conn

datalist = [...]
monetdblite.init("./database/")
with Pool(processes=2, maxtasksperchild=1) as p:
    p.map(process, datalist, 1)
monetdblite.shutdown()
sedot
  • 577
  • 5
  • 16
  • I filed a bug at [MonetDBLite/Python](https://github.com/MonetDB/MonetDBLite-Python/issues/40) – sedot May 20 '19 at 13:44

0 Answers0