1

Basically:

i'm fetching characters out of a text file. Each word ends with a \n in the text file.

I want to create an char ** array that will include ONLY the first character of each word.

To do that, i'm scanning the text file and running on my chars array. If the current char is a \n replace it with a null terminator \0 when inserting to the chars array.

In that way, i'll be able to seperate between words when i'll run on that array with my char **words_arr.

I want the **words_arr to point at every first character of a word, which means one char after each \0 which indicates on an end of a word.

size_t words = 1, total_characters = 30;
char **original_address, **words_arr, *chars, character;

chars = (char *)malloc(sizeof(char) * total_characters);
words_arr = original_address = (char **)malloc(sizeof(char *) * (words);

while (i < total_characters)
    {
        character = fgetc(text_file);
        
        if (character == EOF)
        {
            break;
        }
        if (character == '\n')
        {
            *chars =  '\0';
            *words_arr = chars + 1;
            ++words;
            words_arr = (char **)realloc(original_address, sizeof(char *) * (words));
            words_arr += words-1;
        }
        else
        {
            *chars = character;
        }
        ++chars;
        ++i;
    }

Also, I have no idea how many words will I have, and that is why I use the realloc function, although i'm not sure i'm doing it right.

For now, i'm able to print the chars array and see all the words and each word starts at a new line.

But when I try to print the words array, i'm getting a segmentation fault.

I'm trying to print them using:

i = 0;
    
    chars -= total_characters;
    
    while (i < total_characters)
    {
        runner = chars + i;
        if (*runner == '\0')
        {
            printf("\n");
        }
        else
        {
            printf("%c", *runner);
        }
        ++i;
    }
    
    printf("\n\n");
    
    words_arr -= words;
    
    i = 0;
    
    while (i < words)
    {
        printf("%s", *(words_arr + i));
        ++i;
    }

As I said, the first print, or chars array successes. The second print, of words_arr returns a Segmentation fault.

Thanks.

NoobCoder
  • 513
  • 3
  • 18
  • `realloc` may invalidate the array for re-allocation `original_address`. It is bad to reuse that after that. – MikeCAT Jun 10 '21 at 20:36
  • Also casting results of `malloc()` family is [considered as a bad practice](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). – MikeCAT Jun 10 '21 at 20:37
  • If I'm understanding this correctly, you have read the whole file into memory? If so, just run through everything and count `\n`:s (and change them to `\0` at the same time). That way you know how much to allocate and can do _one_ allocation only. Then run it through a second time to assign the pointers. – Ted Lyngmo Jun 10 '21 at 20:39
  • @MikeCAT Actually I had no idea it's considered as a bad practice! wow. I thought exactly the opposite. That I should cast results of `malloc()` in order to be sure i'm doing everything right and prevent follow-up errors. – NoobCoder Jun 10 '21 at 20:39
  • @TedLyngmo I want it to be done in one loop. I don't want to run more than one time on the text file because it has more than 100,000 characters (the `total_characters = 30` here is just for the test of the code) – NoobCoder Jun 10 '21 at 20:40
  • Running through 100,000 characters 2 times is often a lot faster than doing a lot of `realloc`s where the data often has to be moved. – Ted Lyngmo Jun 10 '21 at 20:41
  • @TedLyngmo Thanks for your help Ted. But as I said, my goal here (for educational purposes) is to understand what is wrong with my code, to manage to do that in one loop and to improve my lack of understanding in the `realloc` errors I might did. – NoobCoder Jun 10 '21 at 20:44
  • @NoobCoder Ah, ok, I see. – Ted Lyngmo Jun 10 '21 at 20:45
  • 1
    As a minor point, please note that `EOF` is not a character; i.e. you need an `int` in order to hold all characters *or* `EOF`. This is why the return type of `fgetc()` is *not* `char`. – unwind Jun 10 '21 at 21:01
  • will `if (int)character = EOF)` work? Or do I **have** to create a new `int` just for this check? – NoobCoder Jun 10 '21 at 21:07
  • 1
    I suggest that you make small functions to manage the memory. It becomes easier to test if every function does the right thing then. It could look like this: [example](https://godbolt.org/z/x55Ezecf4) – Ted Lyngmo Jun 10 '21 at 21:18
  • @NoobCoder No, that won't work. You need a variable of type int. But you don't need a separate one, just have one of type `int` and be happy with. When needed as `char` it is converted back implicitly. – Aconcagua Jun 10 '21 at 21:32
  • @Aconcagua, No, duplicating the array size is not necessary on a modern `malloc`. If calling `realloc` with increments of 1 is simpler, do that. Increments of 1 is at most double the runtime than incrementing in larger increments, even doubling, in practical applications. I have done the tests. That is when doing virtual nothing with the allocated memory. Typically the overhead of `realloc` should be small compared to what you do with the memory. – HAL9000 Jun 11 '21 at 01:48
  • @TedLyngmo, calling `realloc` is quite fast, while doing context switches when reading files is slow (compared to `realloc`). When reading the file twice you get twice the context switches, and your program won't work with non-seekable streams. – HAL9000 Jun 11 '21 at 01:55
  • @HAL9000 I didn't suggest reading the file twice. I assumed the file to be read into memory already and I suggested looping through it in memory twice to find the number of `\n` to only do one `malloc` instead of doing `realloc`s which is not so fast when it can't just extend the already allocated segment but has to move the data. – Ted Lyngmo Jun 11 '21 at 04:18
  • @Aconcagua, Not every system has a `malloc`, some only give you max `64k` on each allocation. Other times you have a `malloc` but real-time constraints prevent its use. So to be totally portable, you shouldn't use `malloc` at all. But it is OK not sacrifice simplicity for premature optimizations, and limit portability to common desktop systems made the last 10 years. It is easy to change the reallocation strategy later if that becomes necessary. – HAL9000 Jun 11 '21 at 14:14
  • @TedLyngmo, Keeping the file in memory and doing two passes on it has both memory and runtime costs. A tight loop may suddenly become much slower if cache/page misses starts creeping in. Modern `realloc` is quite fast. I don't think the extra complexity would be worth it, both in computing and programming costs. – HAL9000 Jun 11 '21 at 14:23
  • @HAL9000 Two passes compared to log reallocational moves. I'd go for stable. But - I did not test it. I assume it'll be better. – Ted Lyngmo Jun 11 '21 at 21:45
  • @HAL9000 OK – C++ analogy: Using `insert` + result of pair on `std::map` instead of `find` + `operator[]` is then premature optimisation, too... – Aconcagua Jun 12 '21 at 07:47
  • @Aconcagua, not sure of the cost of `std::map` or your analogy. But I would say that a good analogy of similar premature optimization in c++, is to either to keep the file in memory, do a first pass so we can do a `std::vector<>::reserve` before the second pass; or do only one pass on the file, and do multiple `std::vector<>::reserve` , doubling the reserved space each time. It is easier to do a simple `pushback`, and let the implementation handle the gory details. Then if you after profiling discover that this is not fast enough, you can try to speed optimize. – HAL9000 Jun 12 '21 at 09:30
  • @Aconcagua, ... that is what `pushback` does for you. My point is that it is simpler to just reimplement `pushback` than reimplementing `pushback` *and* what you think is a better `realloc` than builtin `realloc`. Worry about reimplementing `realloc` later, especially since profiling most likely will show that you don't have to. – HAL9000 Jun 12 '21 at 10:57
  • @Aconcagua, I agree this discussion has gone *way* off topic already, and I also think this will go deep down the rabbit hole if we continue. But I don't see any point in revising history by deleting anything. I hope other people have fund the conversation entertaining... – HAL9000 Jun 12 '21 at 13:42

1 Answers1

1

This part:

words_arr = (char **)realloc(original_address, sizeof(char *) * (words));

is wrong (for a number of reasons).

When realloc returns with succes (i.e. new memory allocated), the value of original_address is no longer valid. But in your code you keep use original_address again inside the loop.

You need to update original_address after each realloc so it is valid for next realloc.

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • but in the next iteration I should update `words_arr` not in the beginning but after the last updated element. So should I do something like that : `words_arr = realloc(original_address,.....)` and then `original address = words_arr` and then `words_arr += words` so it'll be ready for the next iteration in the right place? – NoobCoder Jun 10 '21 at 21:00
  • well I tried the following this: `original_address = (char **)realloc(original_address, sizeof(char *) * (words));` and `words_arr = original_address + words - 1;` and then `while (i < words` : `printf("%s", *(original_address + i));` and i'm still getting the `seg fault` – NoobCoder Jun 10 '21 at 21:09
  • 1
    @NoobCoder The correct way to use `realloc` is: `char** tmp = realloc(..); if(tmp) { original_address = tmp; } else { /* error handling */ }`: You *need* to check the return value before assigning back as reallocation might fail. If you don't check, you lose the original pointer to the still valid memory, and if you don't have yet another, you have a memory leak. – Aconcagua Jun 10 '21 at 21:26