1

So I am writing a little function to parse paths, it looks like this:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int parse_path() {
    char *pathname = "this/is/a/path/hello";
    int char_index = 0;
    char current_char = pathname[char_index];
    char *buffer = malloc(2 * sizeof(char));
    char *current_char_str = malloc(2 * sizeof(char));

    while (current_char != '\0' && (int)current_char != 11) {
        if (char_index == 0 && current_char == '/') {
            char_index++; current_char = pathname[char_index];
            continue;
        }

        while (current_char != '/' && current_char != '\0') {
            current_char_str[0] = current_char;
            current_char_str[1] = '\0';

            buffer = (char *)realloc(buffer, (strlen(buffer) + 2) * sizeof(char));
            strcat(buffer, current_char_str);

            char_index++; current_char = pathname[char_index];
        }

        if (strlen(buffer)) {
            printf("buffer(%s)\n", buffer);
            current_char_str[0] = '\0';
            buffer[0] = '\0';
        }

        char_index++; current_char = pathname[char_index];
    }
};

int main(int argc, char *argv[]) {
    parse_path();
    printf("hello\n");
    return 0;
}

Now, there is undefined behavior in my code, it looks like the printf call inside the main method is changing the buffer variable... as you can see, the output of this program is:

buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
buffer(buffer(%s)
)
buffer(hello)
hello

I have looked at other posts where the same sort of problem is mentioned and people have told me to use a static char array etc. but that does not seem to help. Any suggestions?

For some reason, at one time in this program the "hello" string from printf is present in my buffer variable.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Sebastian Karlsson
  • 715
  • 1
  • 8
  • 19
  • 2
    `strcat(buffer, current_char_str);` but `buffer` has never been initialized. – Paul Ogilvie Feb 10 '19 at 10:41
  • 2
    `realloc` may return null, especially when `strlen(buffer)` of the uninitialized buffe is too large. – Paul Ogilvie Feb 10 '19 at 10:42
  • 1
    @PaulOgilvie, Indeed I was wondering about the same thing `strcat(buffer, current_char_str);` and reading [this](https://stackoverflow.com/questions/18838933/why-do-i-first-have-to-strcpy-before-strcat) – Duck Dodgers Feb 10 '19 at 10:42
  • 1
    You do `strlen(buffer)` in the first `realloc` while buffer points to uninitialized memory – M.M Feb 10 '19 at 10:43
  • `buffer = (char*) realloc(buffer, (strlen(buffer) + 2) * sizeof(char));` shouldn't include `(char*)` (no need to cast return of `malloc,calloc,realloc`) and `* sizeof(char)` (always `1`). You should not `realloc` the pointer itself, instead use a temp pointer and validate `realloc` succeeded before assigning new block to `buffer`, e.g. `void *tmp = realloc(buffer, (strlen(buffer) + 2); if (!tmp) { perror ("realloc-tmp"); break; buffer = tmp;`. Then if `realloc` fails, you don't overwrite `buffer` with `NULL` creating a memory leak. – David C. Rankin Feb 10 '19 at 11:10
  • @DavidC.Rankin I did as you said and now I am getting the error string `"realloc-tmp"` inside my buffer variable ... hahah – Sebastian Karlsson Feb 10 '19 at 11:16
  • Missing `)` and `}` in @DavidC.Rankin's comment. `void *tmp = realloc(buffer, (strlen(buffer) + 2)); if (!tmp) { perror ("realloc-tmp"); break;} buffer = tmp;` – cdarke Feb 10 '19 at 11:22
  • Should be `void *tmp = realloc(buffer, strlen(buffer) + 2);` (too many `'('`) – David C. Rankin Feb 10 '19 at 11:27
  • @cdarke yes I figured, but still the same result :/ buffer variable still being affected by the printf and the perror string... – Sebastian Karlsson Feb 10 '19 at 11:29
  • 1
    That is solely due to `buffer` not being initialized after being allocated. Either make it the empty-string (e.g. `*buffer = 0;` following `malloc`), or use `calloc` instead to set the new memory all zero. – David C. Rankin Feb 10 '19 at 11:34

3 Answers3

4

Sebastian, if you are still having problems after @PaulOgilvie answer, then it is most likely due to not understanding his answer. Your problem is due to buffer being allocated but not initialized. When you call malloc, it allocates a block of at least the size requested, and returns a pointer to the beginning address for the new block -- but does nothing with the contents of the new block -- meaning the block is full random values that just happened to be in the range of addresses for the new block.

So when you call strcat(buffer, current_char_str); the first time and there is nothing but random garbage in buffer and no nul-terminating character -- you do invoke Undefined Behavior. (there is no end-of-string in buffer to be found)

To fix the error, you simply need to make buffer an empty-string after it is allocated by setting the first character to the nul-terminating character, or use calloc instead to allocate the block which will ensure all bytes are set to zero.

For example:

int parse_path (const char *pathname)
{
    int char_index = 0, ccs_index = 0;
    char current_char = pathname[char_index];
    char *buffer = NULL;
    char *current_char_str = NULL;

    if (!(buffer = malloc (2))) {
        perror ("malloc-buffer");
        return 0;
    }
    *buffer = 0;    /* make buffer empty-string, or use calloc */
    ...

Also do not hardcode paths or numbers (that includes the 0 and 2, but we will let those slide for now). Hardcoding "this/is/a/path/hello" within parse_path() make is a rather un-useful function. Instead, make your pathname variable your parameter so I can take any path you want to send to it...

While the whole idea of realloc'ing 2-characters at a time is rather inefficient, you always need to realloc with a temporary pointer rather than the pointer itself. Why? realloc can and does fail and when it does, it returns NULL. If you are using the pointer itself, you will overwrite your current pointer address with NULL in the event of failure, losing the address to your existing block of memory forever creating a memory leak. Instead,

            void *tmp = realloc (buffer, strlen(buffer) + 2);
            if (!tmp) {
                perror ("realloc-tmp");
                goto alldone;           /* use goto to break nested loops */
            }
            ...
    }
    alldone:;

    /* return something meaningful, your function is type 'int' */
}

A short example incorporating the fixes and temporary pointer would be:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int parse_path (const char *pathname)
{
    int char_index = 0, ccs_index = 0;
    char current_char = pathname[char_index];
    char *buffer = NULL;
    char *current_char_str = NULL;

    if (!(buffer = malloc (2))) {
        perror ("malloc-buffer");
        return 0;
    }
    *buffer = 0;    /* make buffer empty-string, or use calloc */

    if (!(current_char_str = malloc (2))) {
        perror ("malloc-current_char_str");
        return 0;
    }

    while (current_char != '\0' && (int) current_char != 11) {
        if (char_index == 0 && current_char == '/') {
            char_index++;
            current_char = pathname[char_index];
            continue;
        }

        while (current_char != '/' && current_char != '\0') {
            current_char_str[0] = current_char;
            current_char_str[1] = '\0';

            void *tmp = realloc (buffer, strlen(buffer) + 2);
            if (!tmp) {
                perror ("realloc-tmp");
                goto alldone;
            }
            strcat(buffer, current_char_str);

            char_index++;
            current_char = pathname[char_index];
        }

        if (strlen(buffer)) {
            printf("buffer(%s)\n", buffer);
            current_char_str[0] = '\0';
            buffer[0] = '\0';
        }

        if (current_char != '\0') {
            char_index++;
            current_char = pathname[char_index];
        }
    }
    alldone:;

    return ccs_index;
}

int main(int argc, char* argv[]) {

    parse_path ("this/is/a/path/hello");
    printf ("hello\n");

    return 0;
}

(note: your logic is quite tortured above and you could just use a fixed buffer of PATH_MAX size (include limits.h) and dispense with allocating. Otherwise, you should allocate some anticipated number of characters for buffer to begin with, like strlen (pathname) which would ensure sufficient space for each path component without reallocating. I'd rather over-allocate by 1000-characters than screw up indexing worrying about reallocating 2-characters at a time...)

Example Use/Output

> bin\parsepath.exe
buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
hello

A More Straight-Forward Approach Without Allocation

Simply using a buffer of PATH_MAX size or an allocated buffer of at least strlen (pathname) size will allow you to simply step down your string without any reallocations, e.g.

#include <stdio.h>
#include <limits.h>  /* for PATH_MAX - but VS doesn't provide it, so we check */

#ifndef PATH_MAX
#define PATH_MAX  2048
#endif

void parse_path (const char *pathname)
{
    const char *p = pathname;
    char buffer[PATH_MAX], *b = buffer;

    while (*p) {
        if (*p == '/') {
            if (p != pathname) {
                *b = 0;
                printf ("buffer (%s)\n", buffer);
                b = buffer;
            }
        }
        else
            *b++ = *p;
        p++;
    }
    if (b != buffer) {
        *b = 0;
        printf ("buffer (%s)\n", buffer);
    }
}

int main (int argc, char* argv[]) {

    char *path = argc > 1 ? argv[1] : "this/is/a/path/hello";
    parse_path (path);
    printf ("hello\n");

    return 0;
}

Example Use/Output

> parsepath2.exe
buffer (this)
buffer (is)
buffer (a)
buffer (path)
buffer (hello)
hello

Or

> parsepath2.exe another/path/that/ends/in/a/filename
buffer (another)
buffer (path)
buffer (that)
buffer (ends)
buffer (in)
buffer (a)
buffer (filename)
hello

Now you can pass any path you would like to parse as an argument to your program and it will be parsed without having to change or recompile anything. Look things over and let me know if you have questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • 1
    Deserves more than just +1. :) – Duck Dodgers Feb 10 '19 at 12:17
  • Just glad to help. You could tell there was still a bit of confusion that needed to be removed. Hopefully this will help. – David C. Rankin Feb 10 '19 at 12:18
  • That is a very extensive answer! – Paul Ogilvie Feb 10 '19 at 12:28
  • 1
    You lead the horse to water, but for some reason he wouldn't drink. So I just took the water to him and poured it over his head `:)` Dunno if he actually drank, but he sure got wet... – David C. Rankin Feb 10 '19 at 12:34
  • Thank you very much for your descriptive answer.... the funny thing is though, even if I run your code that supposedly is a "fix", I still get the same problem where the printf is writing to the buffer variable. Can there be something wrong with my compiler or something? I am using `gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)` – Sebastian Karlsson Feb 10 '19 at 18:18
  • @SebastianKarlsson - that's not possible unless there is a character-set issue. You say you are using `gcc 6.3.0`, but on what? Is it in WSL, or is this a MinGW install. The only thing I can think of is you have either captured a carriage-return in the text you are using as input, you are using UTF-8 (or UTF-16) character set, or something in that manner. Copy and paste my code into an editor and make sure you are saving with `'\n'` line endings. Then compile `gcc -Wall -Wextra -pedantic -std=c11 -Ofast -o parsepath parsepath.c` (that's `'Oh fast'` not zero) and try again. (either version) – David C. Rankin Feb 11 '19 at 02:13
  • @DavidC.Rankin still the same problem even if I compile with your compile optionss :/ I am saving your code in ViM on a Linux machine so it should not be a problem – Sebastian Karlsson Feb 11 '19 at 10:33
  • What is your current setup? What operating system, what compiler, etc..? I have compiled and run both on Linux and Windows using gcc and VS and no issues (even compiling with `-Wall -Wextra -pedantic` on gcc and with `/W3` on VS). That's why I say there is a character set problem somewhere. Are you sure the double-quotes you are using are not actually web-left/right non-ASCII quotes? It is going to be a subtle problem like that. – David C. Rankin Feb 11 '19 at 10:41
  • @DavidC.Rankin I am running Debian 9 on a Dell XPS 13.... yes the file is saved in correct format! Now this is weird, I got it working by adding changing the condition in the outer while loop to: `c != '\0' && char_index < strlen(pathname)` . So for some reason the buffer variable is filled with other data ... even if it should not since we did `*buffer = 0;` ... this is really strange. – Sebastian Karlsson Feb 11 '19 at 11:40
  • 1
    @SebastianKarlsson - I found the issue, you needed one more test `if (current_char != '\0')` at the end of your main loop before advancing `char_index++;` once again which could advance it past the *nul-terminator` if already set to that at the end of the prior loop (another reason why avoiding multiple nested loops is a good idea) – David C. Rankin Feb 11 '19 at 16:15
2

You strcat something to buffer but buffer has never been initialized. strcat will first search for the first null character and then copy the string to concatenate there. You are now probably overwriting memory that is not yours.

Before the outer while loop do:

    *buffer= '\0';
Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
1

There are 2 main problems in your code:

  • the arrays allocated by malloc() are not initialized, so you have undefined behavior when you call strlen(buffer) before setting a null terminator inside the array buffer points to. The program could just crash, but in your case whatever contents is present in the memory block and after it is retained up to the first null byte.
  • just before the end of the outer loop, you should only take the next character from the path if the current character is a '/'. In your case, you skip the null terminator and the program has undefined behavior as you read beyond the end of the string constant. Indeed, the parse continues through another string constant "buffer(%s)\n" and through yet another one "hello". The string constants seem to be adjacent without padding on your system, which is just a coincidence.

Here is a corrected version:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

void parse_path(const char *pathname) {
    int char_index = 0;
    char current_char = pathname[char_index];
    char *buffer = calloc(1, 1);
    char *current_char_str = calloc(1, 1);

    while (current_char != '\0' && current_char != 11) {
        if (char_index == 0 && current_char == '/') {
            char_index++; current_char = pathname[char_index];
            continue;
        }
        while (current_char != '/' && current_char != '\0') {
            current_char_str[0] = current_char;
            current_char_str[1] = '\0';

            buffer = (char *)realloc(buffer, strlen(buffer) + 2);
            strcat(buffer, current_char_str);

            char_index++; current_char = pathname[char_index];
        }
        if (strlen(buffer)) {
            printf("buffer(%s)\n", buffer);
            current_char_str[0] = '\0';
            buffer[0] = '\0';
        }
        if (current_char == '/') {
            char_index++; current_char = pathname[char_index];
        }
    }
}

int main(int argc, char *argv[]) {
    parse_path("this/is/a/path/hello");
    printf("hello\n");
    return 0;
}

Output:

buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
hello

Note however some remaining problems:

  • allocation failure is not tested, resulting in undefined behavior,
  • allocated blocks are not freed, resulting in memory leaks,
  • it is unclear why you test current_char != 11: did you mean to stop at TAB or newline?

Here is a much simpler version with the same behavior:

#include <stdio.h>
#include <string.h>

void parse_path(const char *pathname) {
    int i, n;

    for (i = 0; pathname[i] != '\0'; i += n) {
        if (pathname[i] == '/') {
            n = 1;  /* skip path separators and empty items */
        } else {
            n = strcspn(pathname + i, "/");  /* get the length of the path item */
            printf("buffer(%.*s)\n", n, pathname + i);
        }
    }
}

int main(int argc, char *argv[]) {
    parse_path("this/is/a/path/hello");
    printf("hello\n");
    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189