0

I am trying to write an up to date Python function for removing files securely. This quest led me to this Stack Overflow question, the answers to which, as I understand it, give me two options:

  1. Install the srm command, and call that using subprocess or something similar.
  2. Use a given custom function.

I have to reject option (1) because, although I am using Linux myself, I am writing code destined for a custom PIP package, which needs to be as lightweight and portable as possible.

As for option (2): I have distilled the various functions supplied in the answers to the aforementioned question into one function:

def secure_delete(path_to_file, passes=1):
    length = os.path.getsize(path_to_file)
    with open(path, "br+", buffering=-1) as file_to_overwrite:
        for _ in range(passes):
            file_to_overwrite.seek(0)
            file_to_overwrite.write(os.urandom(length))
    os.remove(path_to_file)

Now this looks like we could be getting somewhere, but I still have some queries:

  • I believe that the os.path stuff has largely been superceded by pathlib. Is that correct?
  • But what about that os.urandom(length) call? Is that the most efficient, up to date way of doing that?
  • I understand what that passes variable is doing, but I do not understand what the point of it is. Is there really all the much to be gain, from a security point of view, by overwriting multiple times?
Tom Hosker
  • 526
  • 2
  • 17
  • 1
    It was (still is?) the case that writing was though to leave a ghost of the prior value so secure delete tools would overwrite the file contents several times. – JonSG Jul 20 '23 at 19:20
  • 2
    _Is there much gained by overwriting multiple times_ For certain kinds of physical storage mediums (e.g. magnetic platters), yes. For other kinds, perhaps not. – John Gordon Jul 20 '23 at 19:24
  • 1
    `os.path` vs. `pathlib`: I wouldn't ding somebody for not using `pathlib`, especially here where you're not using any of its extra functionality. `os.urandom`: since Python 3.6, there's the `secrets` module that gives you an interface to the best-available random number source. Number of passes: more is probably better, but as always "what's your threat model?" Are you trying to prevent your roommate from reading the deleted data, or a global superpower? `srm` also renames the file to something random and truncates it to 0 bytes – bbayles Jul 20 '23 at 19:52
  • @bbayles If you were inclined to expand that comment into an answer, and if you could spell out explicitly what the `secrets` equivalent to `os.urandom` is - in the context of the code in the question, at least - then I would happily give it the green tick of acceptance. (I am already using `secrets` in another part of the same file, so using `secrets` would dovetail really nicely with the rest of the code.) – Tom Hosker Jul 20 '23 at 20:27

1 Answers1

1

The pathlib equivalent of os.path.getsize(path_to_file) would be Path(path_to_file).stat().st_size. I think the os.path version is cleaner and would use it unless I needed something else from pathlib.


The secrets module ought to be used any time you're not sure about the best way to do something with random numbers.

secrets.SystemRandom().randbytes(n) will give you the best available set of n random bytes. Under the hood it uses os.urandom.


Overwriting a file's contents multiple times theoretically impairs forensic recovery methods from retrieving them, but how much extra safety it buys you depends on various factors.

Wei et al 2011 found some amount leftover data even using schemes with dozens of overwrite passes.

bbayles
  • 4,389
  • 1
  • 26
  • 34
  • *Overwriting a file's contents multiple times theoretically impairs forensic recovery methods from retrieving them* Beware copy-on-write filesystems, where it does nothing because it won't actually overwrite the data. – Andrew Henle Jul 23 '23 at 01:06