1

I have a tkinter app with about 1000 users which is collecting data from a game. Every 30 seconds, it saves the collected data to a json file. After having a couple of users experiencing corrupted save files after power outages, I tried implementing a safe json saving function that would always ensure an intact file.

However, after implementing this function a few other users (maybe 1 out of 100) are now getting permission errors during the autosave function. I am not able to replicate the bug on my own machine.

I use the function below for saving, which was inspired from this thread How to make file creation an atomic operation?

def json_dump(dest_file: str, data: [dict, list]) -> None:
    tmp_file = dest_file[:-5] + '_tmp.json'
    with open(tmp_file, 'w') as fo:
        json.dump(data, fo, indent=2)
        fo.flush()
        os.fsync(fo.fileno())
    os.replace(tmp_file, dest_file)

Error message that a user sent me (that I cannot replicate myself, and that most users are not experiencing)

19:46:02,481 root ERROR (<class 'PermissionError'>, PermissionError(13, 'Access is denied'), <traceback object at 0x0000019E641C6700>)
Traceback (most recent call last):
File "tkinter\__init__.py", line 1883, in __call__
File "main_frame.py", line 168, in <lambda>
File "main_frame.py", line 432, in notebook_tab_change
File "main_frame.py", line 541, in SaveActiveState
File "utils\other_utils.py", line 60, in json_dump
PermissionError: [WinError 5] Access is denied: 'Profiles/PIT_tmp.json' -> 'Profiles/PIT.json'

The save to file function can be reached by multiple threads (run through tkinters after method). The main thread handles user interaction with the UI, which can result in a call to the save function (when modifying data manually, or when quitting the app). Another thread is autosaving every 30 seconds, and thus calling the save function as well.

I wonder if the error happens because multiple threads are trying to save to file at the same time? But then again I don't understand why it was not happening when I just did a standard json dump...

oskros
  • 3,101
  • 2
  • 9
  • 28
  • 5
    What about using a database (designed for multi-user read/write); rather than a JSON *file*? – S3DEV Jan 01 '21 at 15:15
  • The data is stored locally on each machine. When I mentioned the 1000 users, they are all running a local copy of the program. But the program itself has multiple threads. There is a main thread for handling UI input (which can result in a call to the save function) and there is another thread autosaving (and thus calling the save function) every 30 seconds. – oskros Jan 01 '21 at 15:20
  • 1
    I recommend looking into sqlite, to avoid reinventing the wheel. – kaya3 Jan 01 '21 at 15:23
  • `sqlite3` is also blocking. Only one thread at a time can read/write. I don't see the benefit. – wuerfelfreak Jan 01 '21 at 16:05
  • Note that this is a **very** Windows-y problem; other operating systems don't enforce this restriction on you. – Charles Duffy Jan 01 '21 at 21:56
  • @ErBhaskarDuttSharma Even if the OP were on Linux, `chmod 777` should **never** be used or recommended; it makes content writable even by completely untrusted system users (like `nobody`, which is often used to run authentication code for not-yet-authenticated inbound network connections) at the same time as it's runnable even by highly-privileged users. That makes the practice a fast route to major security incidents: When I'm playing red-team in an attack scenario, one of the very first things I (or anyone else) looks for on getting into a low-privileged account is writable files that... – Charles Duffy Jan 01 '21 at 21:57
  • @ErBhaskarDuttSharma ...are read or executed by higher-privileged accounts. – Charles Duffy Jan 01 '21 at 21:59
  • 1
    @wuerfelfreak, it's blocking, sure, but sqlite3 can do incremental updates, meaning you don't need to block for long enough to _write the whole file_, but just long enough to write the _specific new data_ and update the indices. Also, sqlite supports locking individual _regions_ of a file, rather than the whole thing. – Charles Duffy Jan 01 '21 at 22:00
  • Instead of opening and closing the file in each thread, open the file once in main, then simply write the data to the file inside the threads. This will hopefully resolve the permission error but may cause some overwrites but do give it a try. – glory9211 Jan 01 '21 at 22:53

1 Answers1

1

To me this looks like a threading issue. Because you are always using the same file name, you got a race condition at this line

os.replace(tmp_file, dest_file)

When two threads are reaching this line at the same time, the first thread will rename tmp_file so that it does no longer exist when the second thread tries to do the same. (Btw: Why are you using os.replace instead of os.rename? I switched to the latter and could not see any difference.)

The issue can be reproduced by starting multiple threads or processes. I will use the ProcessPoolExecutor here as in my experiments solutions for threads did not always work for processes but always the other way round.

import concurrent.futures

from thread_demo.dumper import json_dump

if __name__ == "__main__":

    with concurrent.futures.ProcessPoolExecutor() as executor:
        futures = {
            executor.submit(json_dump, "demo.json", {"random": f"data {i}"}): i
            for i in range(5)
        }

        done, _ = concurrent.futures.wait(
            futures, return_when=concurrent.futures.ALL_COMPLETED
        )

        for d in done:
            print(f"{d}: successful: {d.result()}")

On my machine, I usually get 1-2 successful runs out of 5:

fail [Errno 2] No such file or directory: 'demo_tmp.json' -> 'demo.json' for {'random': 'data 0'}
fail [Errno 2] No such file or directory: 'demo_tmp.json' -> 'demo.json' for {'random': 'data 3'}
fail [Errno 2] No such file or directory: 'demo_tmp.json' -> 'demo.json' for {'random': 'data 4'}
<Future at 0x7fbd3056f2b0 state=finished returned bool>: successful: False
<Future at 0x7fbd3056f0f0 state=finished returned bool>: successful: True
<Future at 0x7fbd30563fd0 state=finished returned bool>: successful: True
<Future at 0x7fbd30563160 state=finished returned bool>: successful: False
<Future at 0x7fbd3056f1d0 state=finished returned bool>: successful: False

We see a 'No such file' error - but I am running under Linux. The PermissionError under Windows might stem from the same issue.

I added an additional try/except statement to json_dump and added a boolean to indicate success for nicer output.

You could solve your problem (if it is indeed caused by multiple threads) by using a mutex from a multiprocessing.Manager(). However, even simpler might be to use a random name for the temp file.

You can do this by hand or use a NamedTemporaryFile. Beware that you need to give the dir argument because at least on Linux systems an 'Invalid cross-device link:' error happens when the temp file is created on another device.

Your updated json_dumps becomes:

import json
import os
import tempfile


def json_dump(dest_file: str, data: [dict, list]) -> bool:
    with tempfile.NamedTemporaryFile("w", dir=".", delete=False) as fo:
        json.dump(data, fo, indent=2)
        fo.flush()
        os.fsync(fo.fileno())
    try:
        os.rename(fo.name, dest_file)
    except Exception as e:
        print(f"fail {e} for {data}")
        return False
    return True

Works on my machine :)

<Future at 0x7f0ef60f61d0 state=finished returned bool>: successful: True
<Future at 0x7f0ef60f62b0 state=finished returned bool>: successful: True
<Future at 0x7f0ef60f60f0 state=finished returned bool>: successful: True
<Future at 0x7f0ef60eb160 state=finished returned bool>: successful: True
<Future at 0x7f0ef60ebfd0 state=finished returned bool>: successful: True

Can you reproduce the PermissionError with the ProcessPoolExecutor method on a windows machine? In that case I would be rather optimistic that we solved your issue.

Please note that the try/except around os.rename is only for debugging purposes. You want to remove that.

Lydia van Dyke
  • 2,466
  • 3
  • 13
  • 25
  • Thanks for a very thorough answer. I am using `os.replace` as `os.rename` gives the following error on Windows when destination file already exists `[WinError 183] Cannot create a file when that file already exists` – oskros Jan 02 '21 at 10:39
  • Also, I tested your solution, and it does indeed seem to work with the example you gave. However, I am wondering whether there could be issues when two threads tries to save new data to the file, and that information from the first thread is lost when the second thread writes to file (as modifications happen by loading the file to memory, editing it, and saving it again) – oskros Jan 02 '21 at 10:40
  • 1
    Well, the results from one of the 2 threads will of course be lost. By what I understand this could mean that a change triggered by some user interaction would not persisted instantly but 30 seconds later when the periodic save job is executed again. – Lydia van Dyke Jan 02 '21 at 10:51
  • 1
    Makes sense, I will think about whether there is need for handling this further but probably not. Either way your solution solves my issue and also is a handy tool for debugging similar problems in the future, so thanks again. – oskros Jan 02 '21 at 10:55
  • Maybe add a timestamp argument to the save function so that it can drop outdated updates. – Lydia van Dyke Jan 02 '21 at 10:56