If I want to empty a directory, is there any reason I shouldn't remove it and recreate It?

Question

I've looked at these two threads for emptying a directory:

However, the functions for emptying a directory given as answers to those threads are very complex, and one answer can encounter a stack overflow.

When I think of removing and adding the directory, I'm thinking of doing it in C, and on a Linux platform. However, if you want to also explain the reasoning behind this on Windows, I'll appreciate that as well. This is the code I would implement:

system( "rm -rf /some/directory" )
mkdir("/some/directory", 0700);

This seems so much simpler. What reason would I not use it? Is it far less efficient than it appears?

Jonathan Leffler · Accepted Answer · 2017-02-05T04:22:29.137

The rm -fr /some/directory command has to do the recursive work for you. Using the command is a lot simpler than writing your own code to do the same job — it embodies the virtue of laziness and exploits code reuse on the program scale. (You can't use the rmdir() system call on a directory unless it is already empty.). So that's not wholly unreasonable.

One issue is security: can someone place an alternative to the system's rm command on your path? On Unix, are /bin and /usr/bin first in your path, or do other directories come first?

If you decide that using /bin/rm (or /usr/bin/rm) is safe enough, that might be a better choice than an unadorned rm, but overall, that isn't too bad.

Another issue is a different aspect of security — can you actually remove and create the directory? Can you write in /some or not? And should you keep the current owner, group, permissions of /some/directory if you remove and recreate it? After those operations, the directory will be owned by the effective UID of the process; if will belong to group with the effective GID of the process (unless there's a sticky bit set on the /some directory — or unless you're on macOS); and the permissions will be 0777 as modified by the current setting of umask().

If these issues are not important, or are circumventable, then remove and recreate is plausible.

Expanding on the comments above:

When you mentioned the system() call in your comment, I had no idea what you were talking about.

The system() function executes the string passed as an argument via a command interpreter. When you write code, you have to think about how it could go wrong when someone malicious is trying to get you run it. When you write "rm -fr /some/directory", you are relying on the shell finding the normal rm command and working properly. However, if your PATH has a value such as $HOME/bin:/bin:/usr/bin (so that commands in your own private bin directory are used in preference to those provided by the system), then if the user can install their own script as $HOME/bin/rm, they can execute arbitrary code with your privileges — which could allow them to leave a way of breaking into your system later keeping all the privileges you have. They might even clean up (most of) the evidence that there was once a $HOME/bin/rm script.

One way to avoid such a problem is to request "/bin/rm -fr /some/directory" (unless rm is /usr/bin and not in /bin, of course). This is arguably safer. There used to be attacks available via the IFS environment variable; these are neutered by modern shells which do not use any inherited value for IFS.

Note that one problem will be interpreting whether the rm command was successful. The -fr option means it will report success under almost all circumstances — but if the directory didn't vanish, your mkdir("/some/directory", 0777) call will fail.

As for the security of the /some/directory, my understanding of what you're trying to say, is that the wrong directory could be selected. I just can't imagine how that would occur.

Assuming that you did have permission to modify the /some directory (you need that to be able to remove /some/directory), and that you could modify all the sub-directories of the old version of /some/directory, then you might start off with /some/directory owned by user victim, group witless and with permission 775, whereas after the command and system call (note that system() is a function rather than a system call; mkdir() is a system call) are successful, the directory might be owned by user victor, group mischief and with permission 0777. This might be less than ideal. If you don't want to break such permission settings, you probably don't want to use the remove and recreate technique — or, not such a simple-minded one as this example shows. You would then have to work a bit harder to remove the contents of the directory without modifying those attributes. You might scan the directory (opendir(), readdir(), closedir() and invoke rm on each name — or sets of names — to clean up thoroughly without removing the directory itself. This is of hybrid complexity. It is more fiddly than simply removing and recreating, but far less fiddly than dealing with full recursive delete over multiple levels.

As I said before, you have to decide whether these issues matter or not. It is important that you're aware that the issues exist, and that you have made a conscious (and informed) decision about whether to handle the issues, and how to handle the issues.

Remember that if your program may be run with root (administrator) privileges, it is a lot more important to be careful — but it still matters even if only ordinary mortal users will run the program.

I'm still a little confused about your explanation of security when it comes to the file paths, but your answer is perfectly sufficient to what I asked, so I'll mark it as the official answer. — Larrimus, Feb 05 '17 at 03:47
@Larrimus: thanks for the accept. Which part of the 'security for file paths' is confusing? Is it the security of the `system()` call and the `rm` command, or the security of the `/some/directory` you are going to remove and recreate? Or both...? — Jonathan Leffler, Feb 05 '17 at 03:51
I suppose it's both. When you mentioned the `system()` call in your comment, I had no idea what you were talking about. As for the security of the `/some/directory` my understanding of what you're trying to say, is that the wrong directory could be selected. I just can't imagine how that would occur. — Larrimus, Feb 05 '17 at 04:04
Since this is tagged Linux, I'd also add that `/some/directory` might also have extended access controls (ACLs) or attributes (xattrs) that might be needed, so removing the directory and recreating it is not necessarily equivalent to removing the contents of the directory. For the directory scanning, I'd recommend [`nftw()`](http://man7.org/linux/man-pages/man3/nftw.3.html) or [`fts()`](http://man7.org/linux/man-pages/man3/fts.3.html)-based tree walk, rather than opendir()/readdir()/closedir(). — Nominal Animal, Feb 05 '17 at 04:30
@NominalAnimal: What would be the advantage of `nftw()` or `fts` over `opendir()` et al? I'm thinking that the program would do its own variant of `xargs`, running the `rm -fr` command on lists of files. Of course, it might be advantageous to use `execv()` instead of `system()` in this context; it avoids the shell (mis)interpreting any special characters in file names, etc. But that's a refinement. — Jonathan Leffler, Feb 05 '17 at 04:34
@NominalAnimal: I agree that there are more attributes than just owner, group and permissions available on files and directories, but they're generally rarer (the owner, group and permissions are unavoidable; extended attributes and ACLs require extra work to add them). So, I skipped them as an issue as I felt that the OP would probably be confused by them. Having them mentioned in a comment is valuable, though. — Jonathan Leffler, Feb 05 '17 at 04:36
@JonathanLeffler: Agreed to latter. As to opendir()/readdir()/closedir(), they do not usually detect any changes in the directory (directories in the general case) being manipulated. Both nftw() and fts() are *supposed* to handle that gracefully, in an architecture-specific manner. Furthermore, using those to delete the entire tree depth first is simple, and avoids the need to execute `/bin/rm`. — Nominal Animal, Feb 05 '17 at 04:42
@NominalAnimal: Since the OP was worried about coding the recursive delete (I assume he's aware of `nftw()` at least; `fts` was new to me today, but I had come across it a few hours before you mentioned it) which was why `system("rm -fr /some/directory")` was considered at all. Yes, using one set of those file tree scanning functions allows you to avoid using the external program, but it also requires a modicum or two of care in the coding. I think we're more in violent agreement than any really fundamental disagreement — there are multiple ways to do it, and care and thought is required. — Jonathan Leffler, Feb 05 '17 at 04:50
@JonathanLeffler: Oh yes; I did not wish to imply any *change* in your answer; only additional suggestions to consider, as a comment to the answer. A third such suggestion also came to mind: if you want to use `/bin/rm` in xargs-like manner, with all the files and directories in `/some/directory/` as parameters, then I'd suggest using `glob("/some/directory/*", GLOB_ERR | GLOB_NOSORT | GLOB_PERIOD, errfunc, &glob)` to obtain those. If fork() is used, I'd recommend `execv("/bin/rm", { "rm", "-rf", "--", ... });` with `...` being the globbed names, split into parts if `E2BIG`. — Nominal Animal, Feb 05 '17 at 04:55

If I want to empty a directory, is there any reason I shouldn't remove it and recreate It?

1 Answers1