9

How can I securely remove a file using python? The function os.remove(path) only removes the directory entry, but I want to securely remove the file, similar to the apple feature called "Secure Empty Trash" that randomly overwrites the file.

What function securely removes a file using this method?

MackM
  • 2,906
  • 5
  • 31
  • 45
kyle k
  • 5,134
  • 10
  • 31
  • 45
  • 1
    this is not a feature of a programming language. this is a feature of the file system/ operating system / storage device. – Elazar Jul 03 '13 at 18:12
  • 1
    IIRC, what Secure Erase Trash actually does is to unlink all the files, then do a single-pass random erasure immediately, then kick off a standard 35-pass erasure in the background. – abarnert Jul 03 '13 at 18:40
  • From what I know you can only overwrite file on HDD, not on SSD, due to the way SSD (flash mem) is working. – kkonrad Aug 03 '21 at 15:19

5 Answers5

13

You can use srm to securely remove files. You can use Python's os.system() function to call srm.

jh314
  • 27,144
  • 16
  • 62
  • 82
  • 2
    I'd use `subprocess.check_call` rather than `os.system`, for all the usual reasons. There's no need for the performance hit, hijacking potential, etc. in spawning a shell, and it's better to automatically check that the call succeeded than to forget to do it manually and assume you've secure-erased files when you really haven't. – abarnert Jul 03 '13 at 18:38
  • 1
    This served me well. Thanks. – chilliefiber Aug 01 '15 at 03:06
7

You can very easily write a function in Python to overwrite a file with random data, even repeatedly, then delete it. Something like this:

import os

def secure_delete(path, passes=1):
    with open(path, "ba+") as delfile:
        length = delfile.tell()
    with open(path, "br+") as delfile:
        for i in range(passes):
            delfile.seek(0)
            delfile.write(os.urandom(length))
    os.remove(path)

Shelling out to srm is likely to be faster, however.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • That is a good idea, but is there an advantage to using `random.seed()` instead of `os.urandom(n)` – kyle k Jul 03 '13 at 18:24
  • 1
    `os.urandom` will probably be (a lot) faster since you can get more than one byte at a time. You'll want to generate the random data in chunks (maybe 256K to 1MB at a time) to avoid needing to hold all the random data in memory. That will probably be about as fast as `srm`. – kindall Jul 03 '13 at 18:27
  • This won't be nearly as secure as using `srm`, and it may not be nearly as fast either. The Gutman algorithm has been standardized for decades for a good reason. And `srm` on some platforms will take advantage of the built-in "Secure Erase" on some hard drives. – abarnert Jul 03 '13 at 18:36
  • `srm` is, however, a solution only on platforms that have `srm`. My point is, there is no reason you couldn't implement whatever secure erasure algorithm you want in Python. My example wasn't meant to be canonical or anything, I didn't even test it. – kindall Jul 03 '13 at 18:43
  • Nice, but pylint complains: `"ba+" is not a valid mode for open. (bad-open-mode)` – Babken Vardanyan Aug 09 '14 at 08:17
  • This code can overwrite the contents of the file and hence the HD where the file is located multiple times. But, does it actually delete the file name too? At the end of the code, the file is simply deleted using os.remove. Can an investigator actually discover that the deleted file used to exist even though the content may not be retrievable? – John Dec 22 '14 at 02:22
  • As mentioned in @phealy3330 comment this opens the file in append mode so all writes are appended to the end of the file and data is not overwrite. Is there a better way to implement this using `r+b` @kindall? It looks like `delfile.seek(0)` does not have an effect as specified [here](https://stackoverflow.com/questions/1466000/difference-between-modes-a-a-w-w-and-r-in-built-in-open-function). – rafagarci Jul 20 '21 at 04:50
  • yeah you probably want `r+` rather than `a+` – kindall Jul 20 '21 at 16:55
6

You can use srm, sure, you can always easily implement it in Python. Refer to wikipedia for the data to overwrite the file content with. Observe that depending on actual storage technology, data patterns may be quite different. Furthermore, if you file is located on a log-structured file system or even on a file system with copy-on-write optimisation, like btrfs, your goal may be unachievable from user space.

After you are done mashing up the disk area that was used to store the file, remove the file handle with os.remove().

If you also want to erase any trace of the file name, you can try to allocate and reallocate a whole bunch of randomly named files in the same directory, though depending on directory inode structure (linear, btree, hash, etc.) it may very tough to guarantee you actually overwrote the old file name.

Community
  • 1
  • 1
Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
  • 1
    +1. But note that there are at some platforms/filesystems where you _can_ do a secure erase from user space, but only by using some special API provided by the kernel/libc/fs. Which means using `srm` will work, but nothing you write in Python (unless you `ctypes` the special API) will. – abarnert Jul 03 '13 at 18:45
  • 1
    Meanwhile, it's probably worth looking at the `srm` for your platform (or, on a platform that doesn't have it, at least at some `srm`). For example, the source from [OS X 10.8](http://www.opensource.apple.com/source/srm/srm-7/srm/src/) is pretty simple if you know C at all, and understand `fts` (which is like Python's `os.walk`); there's almost nothing else tricky there. – abarnert Jul 03 '13 at 18:49
1

So at least in Python 3 using @kindall's solution I only got it to append. Meaning the entire contents of the file were still intact and every pass just added to the overall size of the file. So it ended up being [Original Contents][Random Data of that Size][Random Data of that Size][Random Data of that Size] which is not the desired effect obviously.

This trickery worked for me though. I open the file in append to find the length, then reopen in r+ so that I can seek to the beginning (in append mode it seems like what caused the undesired effect is that it was not actually possible to seek to 0)

So check this out:

def secure_delete(path, passes=3):
with open(path, "ba+", buffering=0) as delfile:
    length = delfile.tell()
delfile.close()
with open(path, "br+", buffering=0) as delfile:
    #print("Length of file:%s" % length)
    for i in range(passes):
        delfile.seek(0,0)
        delfile.write(os.urandom(length))
        #wait = input("Pass %s Complete" % i)
    #wait = input("All %s Passes Complete" % passes)
    delfile.seek(0)
    for x in range(length):
        delfile.write(b'\x00')
    #wait = input("Final Zero Pass Complete")
os.remove(path) #So note here that the TRUE shred actually renames to file to all zeros with the length of the filename considered to thwart metadata filename collection, here I didn't really care to implement

Un-comment the prompts to check the file after each pass, this looked good when I tested it with the caveat that the filename is not shredded like the real shred -zu does

phealy3330
  • 11
  • 1
0

The answers implementing a manual solution did not work for me. My solution is as follows, it seems to work okay.

import os

def secure_delete(path, passes=1):
    length = os.path.getsize(path)
    with open(path, "br+", buffering=-1) as f:
        for i in range(passes):
            f.seek(0)
            f.write(os.urandom(length))
        f.close()
rafagarci
  • 97
  • 9