1

I want to calculate the size of the directory (path) recursively. In my current code I have a function that identifies if it's a directory or file, if it's a directory it calls the function with the subdirectory (file) and if it's a file it adds to the totalSize variable. However, my current code doesn't return anything meaning that there is an error somewhere. here is my code -

#include <sys/types.h>
#include <dirent.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/stat.h>

void getsize(const char *path,int* totalSize);

int main()
{
    int total = 0;
    char path [] = "C:\\Users\\abbas\\Desktop\\Leetcode airplane";
    getsize(path,&total);
    printf("%d",total);
    return total;

}


void getsize(const char *path,int* totalSize)
{
    struct dirent *pDirent;
    struct stat buf;
    struct stat info;
    DIR *pDir;
    int exists;
    char str[100];
    pDir = opendir (path);
    while ((pDirent = readdir(pDir)) != NULL)
    {
        stat(pDirent->d_name,&info);
        if(S_ISDIR(info.st_mode))
        {
            strcpy(str,path);
            strcat(str,"/");
            strcat(str,pDirent->d_name);
            getsize(str,totalSize);
        }
        else
        {
            strcpy(str,path);
            strcat(str,"/");
            strcat(str,pDirent->d_name);
            exists = stat(str,&buf);
            if (exists < 0)
            {
                continue;
            }
            else
            {
                (*totalSize) += buf.st_size;
            }

        }
    }
    closedir(pDir);
}
  • `printf("%d\n", total);` ? – dimich Feb 05 '23 at 04:34
  • As an aside, your `main` should return `0`, not the total, which will be taken as an error indication. `0` means success, non-zero means error. Also, add a newline to the end of the print format. Never just terminate your strings mid-line with no newline. – Tom Karzes Feb 05 '23 at 04:35
  • 2
    You need to include string.h, and that fixed size str[100] is problematic. – Allan Wind Feb 05 '23 at 04:45
  • I'd be surprised indeed if you're not running into the problem described in [`stat()` error "no such file or directory" when file name is returned by `readdir()`](https://stackoverflow.com/questions/5125919/stat-error-no-such-file-or-directory-when-file-name-is-returned-by-readdir) – Jonathan Leffler Feb 05 '23 at 04:48
  • I am getting a "free(): invalid pointer" on infinitely recursion on //../. – Allan Wind Feb 05 '23 at 04:53

3 Answers3

5
  1. Include string.h.
  2. The arbitrary fixed sizestr[100] is problematic. If you are on Linux include linux/limits.h and use str[PATH_MAX] or even better pathconf(path, _PC_NAME_MAX). In either case you should either ensure the buffer is big enough (using snprintf() for instance), or dynamically allocate the buffer.
  3. You need to exclude . and .. otherwise you end up with an infinite loop (path/../path/..).
  4. stat(pDirent->d_name,&info) fails as you need to stat() path/pDirect->d_name not just pDirect->d_name.
  5. (not fixed) Maybe snprintf(path2, sizeof path2, "%s%s%s", path, PATH_SEP, pDirenv->d_name) instead of strcpy() and strcat()?
  6. Check return values of functions otherwise you are wasting time.
  7. No point of doing two stat() calls on the same path so just use (*totalSize) += buf.st_size;.
  8. (not fixed) On Windows, consider using _stat64() with the address of a struct __stat64 (@AndrewHenle).
  9. I assume you only want the size of files.
  10. (not fixed) It would be more natural if getsize() returned the size instead of using int *totalSize out parameter.
  11. (not fixed) Consider using nftw() (or the older ftw()) to walk the tree.

Note that program now accept path via command line for testing purposes.

#include <dirent.h>
#include <errno.h>
#include <linux/limits.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

const char PATH_SEP =
#ifdef _WIN32
    "\\";
#else
     "/";
#endif

void getsize(const char *path,int *totalSize) {
    struct dirent *pDirent;
    DIR *pDir = opendir (path);
    while ((pDirent = readdir(pDir)) != NULL) {
        if(
            !strcmp(pDirent->d_name, ".") ||
            !strcmp(pDirent->d_name, "..")
        )
            continue;

        char path2[PATH_MAX];
        strcpy(path2, path);
        strcat(path2, PATH_SEP);
        strcat(path2, pDirent->d_name);
        struct stat info;
        if(stat(path2, &info) == -1) {
            perror("stat");
            return;
        }
        if(S_ISDIR(info.st_mode))
            getsize(path2, totalSize);
        else if(S_ISREG(info.st_mode))
            (*totalSize) += info.st_size;
    }
    closedir(pDir);
}

int main(argc, char *argv[]) {
    if(argc != 2) {
        printf("usage: your_program path\n");
        return 1;
    }
    int total = 0;
    getsize(argv[1], &total);
    printf("%d\n",total);
}

and example test:

$ mkdir -p 1/2
$ dd if=/dev/zero of=1/file count=123
123+0 records in
123+0 records out
62976 bytes (63 kB, 62 KiB) copied, 0.000336838 s, 187 MB/s
$ dd if=/dev/zero of=1/2/file count=234
234+0 records in
234+0 records out
119808 bytes (120 kB, 117 KiB) copied, 0.0015842 s, 75.6 MB/s
$ echo $((62976 + 119808))
182784
$ ./your_program 1
182784
Allan Wind
  • 23,068
  • 5
  • 28
  • 38
  • 3
    May be worth a mention, if all that is wanted is a total, POSIX `nftw()` (or `ftw()`) will walk the entire directory subtree allowing for the total calculation -- while taking advantage of the efficiency and error checks those functions provide. – David C. Rankin Feb 05 '23 at 06:10
  • @DavidC.Rankin Thanks. I added a note as suggested. – Allan Wind Feb 05 '23 at 06:16
  • 2
    Lots of good advise in the answer. If I could vote twice I would. – David C. Rankin Feb 05 '23 at 06:25
  • 1
    Minor points: if by "total size", OP means "disk space used", `(*totalSize) += info.st_size;` does not account for sparse files using less disk space than the size of the file would otherwise indicate; and using an automatic variable for `path2` could cause issues for deep directory trees; and really pedantic: `PATH_MAX` won't be defined on systems that have different path limits on different filesystem implementations and that strictly adhere to POSIX specifications. In that case `[f]pathconf(_PC_NAME_MAX)` should be used to get the max name length for that particular file system. – Andrew Henle Feb 05 '23 at 10:21
  • 1
    And [Windows `stat()` isn't quite so simple](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions?view=msvc-170), with 6 different `stat()` implementations for various 32/64 bit combinations of values (12 counting wide characters). On Windows, it's probably best to use `_stat64()` with the address of a `struct __stat64` passed. – Andrew Henle Feb 05 '23 at 10:28
  • @AndrewHenle Thanks. I don't use Windows so I don't have a way of testing the latter suggestion but I will note it above. What is your concern re path2 and deep directory trees? – Allan Wind Feb 05 '23 at 10:32
  • 1
    @AllanWind `PATH_MAX` is usually something like 4096 - large enough that recursion in deep directory trees could cause stack overflow, especially for multithreaded processes where a thread's stack might be both smaller than the main thread's stack and unable to grow. Stack overflow isn't really much of a problem for single-threaded 64-bit processes on current systems with GBs of RAM. The second is that `PATH_MAX` really isn't the max path length possible. See [POSIX `pathconf()` **RATIONALE**](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/pathconf.html#tag_16_157_08). – Andrew Henle Feb 05 '23 at 10:47
  • 1
    (cont) "The value returned for the variable {PATH_MAX} indicates the longest **relative** pathname that could be given if the specified directory is the current working directory of the process." – Andrew Henle Feb 05 '23 at 10:49
3

I think the major error of your code lies in the recursive logic.

To quote pp.183 of The C Programming Language:

Each directory always contains entries for itself, called ".", and its parent, ".."; these must be skipped, or the program will loop forever.

Therefore, maybe you can try adding the following if test at the beginning of the while loop:

while ((pDirent = readdir(pDir)) != NULL)
{
    if (strcmp(pDirent->d_name, ".") == 0
        || strcmp(pDirent->d_name, "..") == 0)
        continue;  /* skip self and parent */
    /* ... */
}

Still, there might be other errors, but I think this one is the most significant.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Wert
  • 41
  • 4
1

Practice safe coding.

Below risks buffer overflow.

        // Risky
        strcpy(str,path);
        strcat(str,"/");
        strcat(str,pDirent->d_name);

Had code done,

int len = snprintf(str, sizeof str, "%s/%s", path, pDirent->d_name);
if (len < 0 || (unsigned) len >= sizeof str) {
  fprintf(stderr, "Path too long %s/%s\n", path, pDirent->d_name);
  exit (-1);  
}

Then the code would have readily errored out do to recursion on "." and ".." and led to OP's self-discovery of a key problem.

This make for faster code production and more resilient code. Saves OP time.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256