0

I'm doing some research in neuroscience and I'm using python's tinydb library for keeping track of all my model training runs and the data they generate.

One of the issues I realized might come up is when I try to train multiple models on a cluster. What could happen is that two threads might try to write to the tinydb json file at the same time.

Can someone please let me know if this will be an issue?

theideasmith
  • 2,835
  • 2
  • 13
  • 20
  • Yes. You will lose some or all of your data if you try to write to a flat-file "database" concurrently. Architect your program differently. – MattH May 12 '17 at 13:50
  • What are better lightweight databases to use in python? – theideasmith May 12 '17 at 13:51
  • sqlite, (http://www.sqlite.org/lockingv3.html) can handle concurrency at reasonable levels based on your use case and is already included in python by default (https://docs.python.org/2/library/sqlite3.html) – valentin May 12 '17 at 13:52

2 Answers2

0

Python processes, threads and coroutines offers synchronization primitives such as locks, rlocks, conditions and semaphores. If your threads access randomly one or more shared variables then every thread should acquire lock on this variable so that another thread couldn't access it.

Petr Javorik
  • 1,695
  • 19
  • 25
0

Paraphrased question: Can I update a json file concurrently?

Answer: No

Suggestions:

  1. Use a file locking system to prevent simultaneous read/write of the aggregated results.
  2. Have each unit of work output to it's own results file and run a separate job to aggregate results as needed
  3. Use a thread safe database, e.g. (sqlite3)
Community
  • 1
  • 1
MattH
  • 37,273
  • 11
  • 82
  • 84