1

Many similar posts exist on Stack Overflow, I have carefully reviewed such posts and have linked them to the end of this post. These existing posts have not been helpful, because they demonstrate segfaults resulting from passing uninitialized portions of memory into the char *dest argument of the strncpy() function. Another common theme in these posts are strong recommendations against using strncpy(), but the only alternative recommendation I have read is to use strlcpy() instead.

My program uses a statically allocated array of FS_Info structs to store information about file system entries. The idea is to output the names and sizes of the 10 largest files in a specified directory. As the array is of fixed size, when a filesystem entry is found that is larger than the smallest entry in the array, my program attempts to update the struct describing this smaller entry with values describing the new, larger one.

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <sys/stat.h>
#include <string.h>
#include <limits.h>
#include <unistd.h>

// size of array used to contain filesystem entries
const size_t fs_info_arr_size = 10;

// A struct to contain the name of a filesystem entry and its size in bytes.
typedef struct FS_Info
{
    char name[PATH_MAX];
    long long size;
} FS_Info;

// global pointer to FS_Info array
FS_Info *fs_info_arr[fs_info_arr_size];

// used to sort fs_entries array descending in terms of entry size
static int compare(const void *a, const void *b)
{
    const struct FS_Info *entryA = (FS_Info *)a;
    const struct FS_Info *entryB = (FS_Info *)b;

    return (entryB->size - entryA->size) - (entryA->size - entryB->size);
}

/*
Iterates over an array of FS_Info structs and returns a pointer to the struct having the smallest size member
*/
FS_Info *get_smallest_entry(FS_Info **entries)
{
    long long smallest = entries[0]->size;
    FS_Info *target;

    for (int i = 1; i < fs_info_arr_size * sizeof(FS_Info); i += sizeof(FS_Info))
    {
        if (entries[i]->size < smallest)
        {
            smallest = entries[i]->size;
            target = entries[i];
        }
    }
    return target;
}

/*
Add entires to the array. If the array is full, use the above function to find the
struct having the smallest file size, and if the current file size is larger, replace it.
*/
void update_fs_info_arr(char *path) // FIXME debug call stack shows that using strncpy here causes segfault
{
    static int items_added = 0;

    struct stat st;
    if (stat(path, &st) == 0)
    {
        if (items_added < fs_info_arr_size) // if array capacity will not be exceeded
        {
            strncpy(fs_info_arr[items_added]->name, path, PATH_MAX);
            // strlcpy(fs_info_arr[items_added]->name, path, sizeof(fs_info_arr[items_added]->name));
            // strncpy(fs_info_arr[items_added]->name, path, sizeof(fs_info_arr[items_added]->name) / sizeof(fs_info_arr[items_added]->name[0]) - 1);
            // fs_info_arr[items_added]->name[sizeof(fs_info_arr[items_added]->name) / sizeof(fs_info_arr[items_added]->name[0]) - 1] = 0;
            fs_info_arr[items_added]->size = st.st_size;

            items_added++;
        }
        else
        // find entry having the smallest size and replace it with the current entry if it is larger
        {
            FS_Info *smallest = get_smallest_entry(fs_info_arr);
            if (st.st_size > smallest->size)
            {
                strncpy(smallest->name, path, PATH_MAX);
                // strlcpy(smallest->name, path, sizeof(smallest->name));
                // strncpy(smallest->name, path, sizeof(smallest->name) / sizeof(smallest->name[0]) - 1);
                // smallest->name[sizeof(smallest->name) / sizeof(smallest->name[0]) - 1] = 0;
                smallest->size = st.st_size;
            }
        }
    }
    else
    {
        printf("Error getting stat for entry %s: %d\n", path, stat(path, &st));
    }
}

void walk(const char *currDir)
{
    DIR *dir = opendir(currDir);
    struct dirent *entry;

    if (dir == NULL)
    {
        printf("%s could not be opened", currDir);
        return;
    }

    while ((entry = readdir(dir)) != NULL)
    {
        // if directory is current dir or parent dir
        if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
        {
            continue;
        }

        char path_to_entry[PATH_MAX];
        snprintf(path_to_entry, sizeof(path_to_entry), "%s/%s", currDir, entry->d_name);
        //snprintf(path_to_entry, sizeof(path_to_entry) - 1, "%s/%s", currDir, entry->d_name);
        //path_to_entry[sizeof(path_to_entry) - 1] = '\0';

        update_fs_info_arr(path_to_entry);

        if (entry->d_type == DT_DIR)
        {
            walk(path_to_entry);
        }
    }
    closedir(dir);
}

int main(int argc, char *argv[])
{

    char target_dir[PATH_MAX];

    strncpy(target_dir, argv[1], PATH_MAX);

    printf("Finding the %zu largest files in: %s\n", fs_info_arr_size, target_dir);

    // recursively visit all entries in the specified directory
    walk(target_dir);

    // sort the entries descending by file size
    qsort(fs_info_arr, fs_info_arr_size, sizeof(*fs_info_arr), compare);

    // output ten largest files found
    for (int i = 0; i < fs_info_arr_size; i++)
    {
        printf("%s\t%lld\n", fs_info_arr[i]->name, fs_info_arr[i]->size);
    }

    return EXIT_SUCCESS;
}

The posts linked below fail to initialize memory being copied into by strncpy() or involve an incorrect size_t num argument as specified here. In the case of the later, I attempted changing my code to match the pattern described in the post linked first below, but this had no effect.

strncpy leading to segmentation fault

Segmentation fault when calling strcpy

Segmentation fault: 11 (caused by strncpy())

Why do I get a segmentation fault when using strncpy?

strncpy segfault

Segfault on strncpy call

Segmentation Fault when using strncpy in c

Edit:

I've made fixes as suggested in the answer section and realize why my program logic was flawed, but this program still results in a bad access:

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <sys/stat.h>
#include <string.h>
#include <limits.h>
#include <unistd.h>

// size of array used to contain filesystem entries
const size_t fs_info_arr_size = 10;

/*
    A struct to contain the name of a filesystem entry and its size in bytes.
    An array of this type will be used to catalog all filesystem entries for
    the directory specified as command line argument.
*/
typedef struct FS_Info
{
    char name[PATH_MAX];
    long long size;
} FS_Info;

// global pointer to FS_Info array
FS_Info fs_info_arr[fs_info_arr_size];

// used to sort fs_entries array descending in terms of entry size
static int compare(const void *a, const void *b)
{
    const struct FS_Info *entryA = (FS_Info *)a;
    const struct FS_Info *entryB = (FS_Info *)b;

return entryB->size - entryA->size;
}

/*
Iterates over an array of FS_Info structs and returns a pointer to the struct
having the smallest size member.
*/
FS_Info *get_smallest_entry(FS_Info *entries)
{
    long long smallest = entries[0].size;
    FS_Info *target;

    for (int i = 1; i < fs_info_arr_size; i++)
    {
        if (entries[i].size < smallest)
        {
            smallest = entries[i].size;
            target = &entries[i];
        }
    }
    return target;
}

/*
Add entires to the array. If the array is full, use the above function to find the
struct having the smallest file size, and if the current file size is larger, replace it.
*/
void update_fs_info_arr(char *path) // FIXME debug call stack shows that using strncpy here causes segfault
{
    static int items_added = 0;

    struct stat st;
    if (stat(path, &st) == 0)
    {
        if (items_added < fs_info_arr_size) // if array capacity will not be exceeded
        {
            strncpy(fs_info_arr[items_added].name, path, PATH_MAX);
            fs_info_arr[items_added].size = st.st_size;

            items_added++;
        }
        else
        // find entry having the smallest size and replace it with the current entry if it is larger
        {
            FS_Info *smallest = get_smallest_entry(fs_info_arr);
            if (st.st_size > smallest->size)
            {
                strncpy(smallest->name, path, PATH_MAX);
                smallest->size = st.st_size;
            }
        }
    }
    else
    {
        printf("Error getting stat for entry %s: %d\n", path, stat(path, &st));
    }
}

void walk(const char *currDir)
{
    DIR *dir = opendir(currDir);
    struct dirent *entry;

    if (dir == NULL)
    {
        printf("%s could not be opened", currDir);
        return;
    }

    while ((entry = readdir(dir)) != NULL)
    {
        // if directory is current dir or parent dir
        if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
        {
            continue;
        }

        char path_to_entry[PATH_MAX];
        snprintf(path_to_entry, sizeof(path_to_entry), "%s/%s", currDir, entry->d_name);

        update_fs_info_arr(path_to_entry);

        if (entry->d_type == DT_DIR)
        {
            walk(path_to_entry);
        }
    }
    closedir(dir);
}

int main(int argc, char *argv[])
{

    // a char array to hold a filesystem path
    char target_dir[PATH_MAX];

    strncpy(target_dir, argv[1], PATH_MAX);

    printf("Finding the %zu largest files in: %s\n", fs_info_arr_size, target_dir);

    // recursively visit all entries in the specified directory
    walk(target_dir);

    // sort the entries descending by file size
    qsort(fs_info_arr, fs_info_arr_size, sizeof(*fs_info_arr), compare);

    // output ten largest files found
    for (int i = 0; i < fs_info_arr_size; i++)
    {
        printf("%s\t%lld\n", fs_info_arr[i].name, fs_info_arr[i].size);
    }

    return EXIT_SUCCESS;
}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
John Harrington
  • 1,314
  • 12
  • 36
  • 3
    For one, `i += sizeof(FS_Info)` in `get_smallest_entry ` is horribly wrong. `i` is used as an array subscript. `i` should be bumped by `1` on each iteration. Likewise `i < fs_info_arr_size * sizeof(FS_Info)` is wrong. That should be `i < fs_info_arr_size` – WhozCraig Dec 29 '22 at 00:05
  • 1
    Have a look at this: https://stackoverflow.com/a/74905718/17592432 – Fe2O3 Dec 29 '22 at 00:11
  • 1
    `FS_Info *fs_info_arr[fs_info_arr_size];` is an array of pointers... Where do you assign them to point to blocks of memory??? – Fe2O3 Dec 29 '22 at 00:16
  • @WhozCraig yikes I can't believe I wrote that.... thanks for pointing this out.... – John Harrington Dec 29 '22 at 00:18
  • 1
    re Version 2: `get_smallest_entry()` ... Try assigning `target` to point to the 0th element at the top of the function... Potential to return uniniatilized pointer that is immediately dereferenced by the calling function. – Fe2O3 Dec 29 '22 at 00:49
  • 1
    @Fe2O3 that did the trick! You are the C magician. Next, I am going to work on implementing the alternate approach you linked. – John Harrington Dec 29 '22 at 00:52
  • 1
    Glad you've got the results you want from your version... Lesson from last correction: "ALWAYS initialise variables when/where they are defined." Otherwise, sporadic works/fails symptoms will age you prematurely... Cheers! `:-)` – Fe2O3 Dec 29 '22 at 01:02

1 Answers1

1

OP: "My program uses a statically allocated array of FS_Info structs to store information about file system entries."

Code:

FS_Info *fs_info_arr[fs_info_arr_size];

You are allocating pointers, but never allocate space to store what the pointers might point at...

Try

FS_Info fs_info_arr[fs_info_arr_size];

and make the correct adaptations to the code to use the array of structs (not the array of pointers that are all NULL.)

And...

static int compare(const void *a, const void *b)
{
    const struct FS_Info *entryA = (FS_Info *)a;
    const struct FS_Info *entryB = (FS_Info *)b;

    return (entryB->size - entryA->size) - (entryA->size - entryB->size);
}

is needlessly complex. For descending ordering simply use

return entryB->size - entryA->size;

(and you can apologise to strncpy() for the inappropriate slurring of its reputation.)

Fe2O3
  • 6,077
  • 2
  • 4
  • 20
  • Of course if you use strncpy to copy a string that doesn’t fit, strncpy will survive, but the destination doesn’t contain a valid string and will likely cause trouble later. Plus more trouble if your strings are utf-8. – gnasher729 Dec 29 '22 at 00:26
  • 1
    @gnasher729 Noted, and caution is called for... _IF_ the OP trusted the library function and (instead of blaming that function) had examined their own code for imperfections, this question would not arise... (Also, the "PATH_MAX" or "MAX_PATH" provides a modicum of safety to use `strncpy()` in this particular instance... This isn't the wild west of accepting user input with `gets()`... – Fe2O3 Dec 29 '22 at 00:30
  • Thanks for pointing out the flawed logic involved in using an array of struct pointers. I am editing my question to include updated code which still results in `Bus Error` with `Bad Access Code 2`. I will include the full program so that you may try on your system if you wish. – John Harrington Dec 29 '22 at 00:38
  • @JohnHarrington When you edit your code, please use some notation that does not invalidate current comments or answers.... Creative use of `#if 0 // was bad` `#else` and `#endif` will, at least, retain some of the past code... Did you examine the version at the link I provided in a comment under your question? Much, much less code. Fewer places for bugs to hide... – Fe2O3 Dec 29 '22 at 00:42
  • 1
    @Fe2O3 I've reverted my edits to the original code block such that it demonstrates that awful array access, and have included a new code block to demonstrate the changes that I have made. As I am new to C, I wanted to see this approach through, but am looking your earlier implementation, a bit embarrassed to say I have a hard time following it. I will try this approach next, though. – John Harrington Dec 29 '22 at 00:49
  • @JohnHarrington We were all beginners once. The version in that other question was primarily to show how `memove()` could be used to keep the array of `biggies[]` in sorted order (insertion sort... not the best, but demonstrates `memmove()`...) Notice, too, there are no extra 'helper functions' in that code. Sometimes, "factoring out" code just means the reader has to break focus to find & read distant code... It's a balancing act, aka "an opinion", and we all have our own personal preferences. `:-)` – Fe2O3 Dec 29 '22 at 00:59