27

I am most interested in the non-recursive case, but I am guessing others who might track this question would prefer seeing the recursive case.

Basically, we are aiming to accomplish:

rm -rf <target>

However, a system call would be an immature answer.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Setjmp
  • 27,279
  • 27
  • 74
  • 92

6 Answers6

62

Use the nftw() (File Tree Walk) function, with the FTW_DEPTH flag. Provide a callback that just calls remove() on the passed file:

#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <ftw.h>
#include <unistd.h>

int unlink_cb(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf)
{
    int rv = remove(fpath);

    if (rv)
        perror(fpath);

    return rv;
}

int rmrf(char *path)
{
    return nftw(path, unlink_cb, 64, FTW_DEPTH | FTW_PHYS);
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
caf
  • 233,326
  • 40
  • 323
  • 462
  • 6
    Wow, I learned something new. No idea that `remove` could remove directories. – R.. GitHub STOP HELPING ICE Mar 29 '11 at 12:25
  • 2
    `remove()` will only remove *blank* directories. – Jon Egeland Nov 10 '13 at 23:08
  • 4
    @Jon: Right, but the `FWD_DEPTH` flag to `nftw()` means that the contents of the directory will be removed before the directory itself is passed to `remove()`, so it will be empty at that point. – caf Nov 11 '13 at 12:17
  • 1
    I tried to use this elegant implementation, but it failed in one case when `rm -rf` succeded: if there is a lot of nested directories, `remove` will encounter ENAMETOOLONG error, but `rm -rf` will use `openat` and `unlinkat` to work with one directory name at a time and delete even deeply nested directory structures. – Orivej Desh May 22 '17 at 02:58
  • 1
    @OrivejDesh: Sounds like we need an `nnftw()` (new, new file tree walk) that passes a directory file descriptor and relative name :) – caf May 22 '17 at 03:15
22
  1. You need to use nftw() (or possibly ftw()) to traverse the hierarchy.
  2. You need to use unlink() to remove files and other non-directories.
  3. You need to use rmdir() to remove (empty) directories.

You would be better off using nftw() (rather than ftw()) since it gives you controls such as FTW_DEPTH to ensure that all files under a directory are visited before the directory itself is visited.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 5
    +1 for recommending `nftw`, which had slipped my mind. Make sure you use `nftw` with the `FTW_DEPTH` option, otherwise you'll get a pre-order traversal, which won't work for `rmdir`-ing directories. – Dave Goodell Mar 29 '11 at 04:09
  • @Dave: Believe it or not - I added my comment about using `nftw()` and `FTW_DEPTH` before seeing your comment. I agree with you. – Jonathan Leffler Mar 29 '11 at 04:16
  • 1
    Every system that has `ntfw()` also has `remove()` that will behave like `rmdir()` when called on a directory and like `unlink()` when called on a path. So just always use `remov()` and use `ntfw` with `FTW_DEPTH` and that's it. – Mecki Aug 19 '16 at 11:05
  • Oh, and one should probably note that `ntfw()` is absolutely not thread-safe and it may even break if some other thread changes the current working directory - a good implementation would probably not care but POSIX standard says that `ntfw()` can rely that a set working directory stays set during processing and some simple implementations may do so. – Mecki Aug 19 '16 at 11:07
10

You can write own implementation command "rm -rf" on pure the C programming language. Source code based only on headers: dirent.h, sys/stat.h and unistd.h. If you need portable code to other systems, as example to the Windows, you need only change headers with corresponding functionality, at the same time the algorithm will not be changed.


A file rmtree.c

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

// POSIX dependencies
#include <dirent.h>
#include <sys/stat.h>
#include <unistd.h>


void
rmtree(const char path[])
{
    size_t path_len;
    char *full_path;
    DIR *dir;
    struct stat stat_path, stat_entry;
    struct dirent *entry;

    // stat for the path
    stat(path, &stat_path);

    // if path does not exists or is not dir - exit with status -1
    if (S_ISDIR(stat_path.st_mode) == 0) {
        fprintf(stderr, "%s: %s\n", "Is not directory", path);
        exit(-1);
    }

    // if not possible to read the directory for this user
    if ((dir = opendir(path)) == NULL) {
        fprintf(stderr, "%s: %s\n", "Can`t open directory", path);
        exit(-1);
    }

    // the length of the path
    path_len = strlen(path);

    // iteration through entries in the directory
    while ((entry = readdir(dir)) != NULL) {

        // skip entries "." and ".."
        if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, ".."))
            continue;

        // determinate a full path of an entry
        full_path = calloc(path_len + strlen(entry->d_name) + 1, sizeof(char));
        strcpy(full_path, path);
        strcat(full_path, "/");
        strcat(full_path, entry->d_name);

        // stat for the entry
        stat(full_path, &stat_entry);

        // recursively remove a nested directory
        if (S_ISDIR(stat_entry.st_mode) != 0) {
            rmtree(full_path);
            continue;
        }

        // remove a file object
        if (unlink(full_path) == 0)
            printf("Removed a file: %s\n", full_path);
        else
            printf("Can`t remove a file: %s\n", full_path);
        free(full_path);
    }

    // remove the devastated directory and close the object of it
    if (rmdir(path) == 0)
        printf("Removed a directory: %s\n", path);
    else
        printf("Can`t remove a directory: %s\n", path);

    closedir(dir);
}


int
main(const int argc, char const *argv[])
{
    if (argc != 2) {
        fprintf(stderr, "Missing single operand: path\n");
        return -1;
    }

    rmtree(argv[1]);

    return 0;
}

Examine it.

I am use a shell script for generation a file/folder structure.

$ cat script.sh 

mkdir -p dir1/{dir1.1,dir1.2,dir1.3}
mkdir -p dir1/dir1.2/{dir1.2.1,dir1.2.2,dir1.2.3}
mkdir -p dir2/{dir2.1,dir2.2}
mkdir -p dir2/dir2.2/dir2.2.1
mkdir -p dir2/dir2.2/{dir2.2.1,dir2.2.2}
mkdir -p dir3/dir3.1
mkdir -p dir4
mkdir -p dir5

touch dir1/dir1.1/file.scala
touch dir1/dir1.2/file.scala
touch dir2/dir2.2/{file.c,file.cpp}
touch dir2/dir2.2/dir2.2.2/{file.go,file.rb}
touch dir3/{file.js,file.java}
touch dir3/dir3.1/{file.c,file.cpp}
> dir4/file.py

Run the script

$ ./script.sh 

Generated the file/folder structure

$ tree
.
├── dir1
│   ├── dir1.1
│   │   └── file.scala
│   ├── dir1.2
│   │   ├── dir1.2.1
│   │   ├── dir1.2.2
│   │   ├── dir1.2.3
│   │   └── file.scala
│   └── dir1.3
├── dir2
│   ├── dir2.1
│   └── dir2.2
│       ├── dir2.2.1
│       ├── dir2.2.2
│       │   ├── file.go
│       │   └── file.rb
│       ├── file.c
│       └── file.cpp
├── dir3
│   ├── dir3.1
│   │   ├── file.c
│   │   └── file.cpp
│   ├── file.java
│   └── file.js
├── dir4
│   └── file.py
├── dir5
├── rmtree.c
└── script.sh

16 directories, 13 files

Build the source code of the file rmtree.c by the GCC

$ cc -o -Wall -Werror -o rmtree rmtree.c

Remove a directory dir1/dir1.1

$ ./rmtree dir1/dir1.1
Removed a file: dir1/dir1.1/file.scala
Removed a directory: dir1/dir1.1

Remove a directory dir1/dir1.2

$ ./rmtree dir1/dir1.2
Removed a directory: dir1/dir1.2/dir1.2.3
Removed a file: dir1/dir1.2/file.scala
Removed a directory: dir1/dir1.2/dir1.2.1
Removed a directory: dir1/dir1.2/dir1.2.2
Removed a directory: dir1/dir1.2

Remove a directory dir1/

$ ./rmtree dir1
Removed a directory: dir1/dir1.3
Removed a directory: dir1

Remove a directory dir2/dir2.2/dir2.2.2

$ ./rmtree dir2/dir2.2/dir2.2.2
Removed a file: dir2/dir2.2/dir2.2.2/file.rb
Removed a file: dir2/dir2.2/dir2.2.2/file.go
Removed a directory: dir2/dir2.2/dir2.2.2

Remove a directory dir2/

$ ./rmtree dir2
Removed a directory: dir2/dir2.1
Removed a file: dir2/dir2.2/file.c
Removed a directory: dir2/dir2.2/dir2.2.1
Removed a file: dir2/dir2.2/file.cpp
Removed a directory: dir2/dir2.2
Removed a directory: dir2

Remove a directory dir3/dir3.1

$ ./rmtree dir3/dir3.1
Removed a file: dir3/dir3.1/file.c
Removed a file: dir3/dir3.1/file.cpp
Removed a directory: dir3/dir3.1

Remove a directory dir3

$ ./rmtree dir3
Removed a file: dir3/file.js
Removed a file: dir3/file.java
Removed a directory: dir3

Remove a directory dir4

$ ./rmtree dir4
Removed a file: dir4/file.py
Removed a directory: dir4

Remove a empty directory dir5

$ ./rmtree dir5
Removed a directory: dir5

If a passed path is not exists or is not path of a directory, you will can see next:

$ ./rmtree rmtree.c
Is not directory: rmtree.c
$ ./rmtree 11111111111111111
Is not directory: 11111111111111111

See results

$ tree
.
├── rmtree
├── rmtree.c
└── script.sh

0 directories, 3 files

Testing environment

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.7 (jessie)
Release:    8.7
Codename:   jessie
$ uname -a
Linux localhost 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux
$ cc --version
cc (Debian 4.9.2-10) 4.9.2
PADYMKO
  • 4,217
  • 2
  • 36
  • 41
  • 1
    Hi! You've missed a +1 for the "/" in the full_path calloc – debuti May 29 '18 at 13:48
  • 1
    Hi again! I forgot to mention that your code is leaking memory through the same variable full_path – debuti Jun 01 '18 at 13:23
  • Nice program, but with memory problems as debuti have already noted. – Kibernetik Aug 13 '19 at 18:07
  • `full_path = calloc(path_len + strlen(entry->d_name) + 1, sizeof(char));` How about freeing it? –  Jan 29 '20 at 04:27
  • jj, nice code, thanks. though carreful about those two bugs: path_len + strlen(entry->d_name) + 1 /* "/" */ + 1 /* eof */ and free memory after recurse call of rmtree – Ján Lazár May 06 '21 at 12:54
6

I just cracked open the GNU rm source and see what exactly it does:

http://www.gnu.org/software/coreutils/

rm relies on the following functions:

fts_open
fts_read
fts_set
fts_close

which have man pages on both linux and mac.

Setjmp
  • 27,279
  • 27
  • 74
  • 92
3

See man 2 unlink and man 2 rmdir for system calls that will delete files and (empty) directories respectively. All you need to do then in order to handle the recursive case is to traverse the target directory in a post-order depth-first traversal and delete each entry in that order with the correct deletion routine. You can use opendir, readdir, and closedir to traverse the directory structure.

Dave Goodell
  • 2,143
  • 16
  • 18
2

In pseudo code, here's the non-recursive approach I would take:

create a stack to hold directory names.
push argv contents onto the stack
while (stack !empty) {
    look at the top directory name on the stack
    for each item in directory {
        if (item is a directoy) {
            push it onto the stack
        } else {
            delete it
        }
    }
    if (no subdirs were pushed) {
        pop the top dir name from the stack
        delete it
    }
}

I'll leave implementing this in C as an exercise for the reader. :-)

(Edit: Also, unless this is purely a learning exercise, don't reinvent this wheel - it would be far easier, and thus less bug-prone, to use ftw or nftw as others have suggested.)

Sherm Pendley
  • 13,556
  • 3
  • 45
  • 57