0

Is there some way to lock a zarr store when using append?

I have already found out the hard way that using append with multiple processes is a bad idea (the batches to append aren't aligned with the batch size of the store). The reason I'd like to use multiple processes is because I need to transform the original arrays before appending them to the zarr store. It would be nice to be able to block other processes from writing concurrently but still perform the transformations in parallel, then append their data in series.

Edit:

Thanks to jdehesa's suggestion, I became aware of the synchronization part of the documentation. I passed a ProcessSynchronizer pointing to a folder on disk to my array at creation in the main thread, then spawned a bunch of worker processes with concurrent.futures and passed the array to all the workers for them to append their results. I could see that the ProcessSynchronizer did something, as the folder I pointed it to filled with files, but the array that my workers write to ended up missing rows (compared to when written from a single process).

sobek
  • 1,386
  • 10
  • 28
  • 1
    I suppose you have to pass a [`synchronizer`](https://zarr.readthedocs.io/en/stable/api/sync.html) on [creation](https://zarr.readthedocs.io/en/stable/api/creation.html) to protect write access. – jdehesa May 14 '20 at 14:24
  • @jdehesa Thanks for your comment, It helped me find the relevant documentation. – sobek May 14 '20 at 14:32
  • @jdehesa Tried the `ProcessSynchronizer`, i can see that it does write some stuff to its folder, but had no effect on output. I'm still missing rows in the result. I also disabled threading in blosc, which also didn't solve the issue. – sobek May 14 '20 at 15:32

0 Answers0