1

I wrote a functions that tries to join two strings, using dynamic memory allocation. I can understand that it could be easer if I used a defined array of char, but I would to learn how manage memory in C.

The problem is simple ... Running the program with valgrind I found different error like these:

  • Invalid write of size 1
  • Invalid read of size 1
  • Address .... is 0 bytes [after|inside] a block of size 4 alloc'd.

I tried to find out where the problem could be, I did, but I didn't understand what's wrong.

It's very important to me to solve this problem, so I can go on with my homework and check the errors.

That's the code

void _join(char *s1, char *s2)
{
    int i = 0;
    int len = strlen(s1);
    while (i < strlen(s2)){
        s1 = realloc(s1, (len + i)*sizeof(char));
        s1[len + i]= s2[i];
        i++;
    }
    // I insert that after, but I don't know if it's necessary
    s1 = realloc(s1, (len + i + 1)*sizeof(char));
    s1[len + i] = '\0';
}

The code goes from line 48 to line 58.

I call the function from the main in this way

_join(s1, s2);

Where s1 and s2 are given by input. Then I call free function on s1 and s2, 'cause I created there with malloc.

A snippets of valgrind output

Before: hello
==11005== Invalid write of size 1
==11005==    at 0x109322: _join (changeJOIN.c:53)
==11005==    by 0x1091E9: main (changeJOIN.c:22)
==11005==  Address 0x4a41be4 is 0 bytes after a block of size 4 alloc'd
==11005==    at 0x4839D7B: realloc (vg_replace_malloc.c:826)
==11005==    by 0x1092FB: _join (changeJOIN.c:52)
==11005==    by 0x1091E9: main (changeJOIN.c:22)
==11005==

... Other error ...

==11005==  
==11005== HEAP SUMMARY:
==11005==     in use at exit: 9 bytes in 1 blocks
==11005==   total heap usage: 17 allocs, 17 frees, 2.109 bytes allocate
==11005==
==11005== 9 bytes in 1 blocks are definitely lost in loss record 1 of 1
==11005==    at 0x4839D7B: realloc (vg_replace_malloc.c:826)
==11005==    by 0x10935B: _join (changeJOIN.c:56)
==11005==    by 0x1091E9: main (changeJOIN.c:22)
CoffeeTableEspresso
  • 2,614
  • 1
  • 12
  • 30
lmriccardo
  • 87
  • 8
  • Rather than multiple calls to `realloc`, you could `realloc(s1, len + sizeof(s2) +1);` and then loop (or `memcpy`) to copy `s2` into `s1`. Save a lot of time. – user4581301 May 23 '19 at 21:43
  • You're allocating `N` bytes and writing past that by one. What's the largest index you can use with an array that size? – Shawn May 23 '19 at 21:43
  • 1
    First, your function does not return a `char`, despite its prototype. Second, it is very inefficient. Third, and most important, `realloc` invalidates the original value of `s1` but the change does not propagate to the caller. A simple solution: return `s1` as `char*` and free it at the caller's. – DYZ May 23 '19 at 21:43
  • 1
    Worse, it might use the same storage allocated for `s1` after the `realloc`, making you think your program works when it doesn't. – user4581301 May 23 '19 at 21:46
  • I forgot about `strcat`. It's another simple solution to appending `s2` to the resized `s1` – user4581301 May 23 '19 at 21:47
  • my answer should cover all the major issues with the code posted in the question, and not have memory issues. – CoffeeTableEspresso May 23 '19 at 21:49
  • I think the C++ and bash tags aren't appropriate for this question, since this is pure C and has nothing to do with bash. – CoffeeTableEspresso May 23 '19 at 21:54
  • I know `strcat` but I wanted to write my own strcat function @user4581301 – lmriccardo May 23 '19 at 21:54
  • I could be misreading that comment, but just in case, remember that `strcat` does NOT resize the receiving buffer. – user4581301 May 23 '19 at 21:58
  • @user4581301 Just I didn't want to use strcat ... – lmriccardo May 23 '19 at 22:03
  • Note that you should not, in general, create function, variable, tag or macro names that start with an underscore. Part of [C11 §7.1.3 Reserved identifiers](https://port70.net/~nsz/c/c11/n1570.html#7.1.3) says: — _All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use._ — _All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces._ See also [What does double underscore (`__const`) mean in C?](https://stackoverflow.com/a/1449301) – Jonathan Leffler May 24 '19 at 04:27

1 Answers1

3

cleaner implementation:

char *_join(char *s1, char *s2)
{
    size_t len1 = strlen(s1);
    size_t len2 = strlen(s2);
    s1 = realloc(s1, len1 + len2 + 1); // +1 for null terminator.
    memcpy(s1 + len1, s2, len2 + 1);

    return s1;
}

This should fix your valgrind issues, and is much cleaner.

Make sure you call it as s1 = _join(s1, s2), because the realloc will invalidate s1.

CoffeeTableEspresso
  • 2,614
  • 1
  • 12
  • 30
  • Worth noting that a preceding underscore on an identifier in the global namespace is forbidden outside of the implementation. As far as bugs go, It rarely bites but is a really nasty surprise with utterly bizarre results when it does happen. More on that in [What are the rules about using an underscore in a C++ identifier?](https://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier) – user4581301 May 23 '19 at 21:56
  • I have to see about memcpy function, but I understand thanks. Just ... Can't I do that without returning s1? – lmriccardo May 23 '19 at 22:00
  • 1
    @ABsintioLM You could `void_join(char *& s1, char *s2)` or `void _join(char ** s1, char *s2)` and pass `s1`'s pointer by reference so that you can update it, but otherwise the `s1` inside `_join` is a copy of the argument used by the caller. Updating the copy does nothing to update the argument. Sounds silly, I know, but `char *s1` passes the buffer pointed to by reference, but the pointer itself is still passed by value. – user4581301 May 23 '19 at 22:03
  • 1
    @ABsintioLM you need to return s1 in this case, since realloc could decide to allocate the larger chunk of memory somewhere else, in which case the pointer is no longer vlaid – CoffeeTableEspresso May 23 '19 at 22:05
  • @CoffeeTableEspresso Okay, so realloc does not allocate continuos memory. But theoretically, shouldn't it maintain current values but reserving more memory? Why is the value of the original s1 invalidated by `realloc`? – lmriccardo May 23 '19 at 22:16
  • 2
    If the current block of memory allocated can't be grown, realloc will need to allocate one somewhere else. It does copy the contents over, but the address will change. – Lee Daniel Crocker May 23 '19 at 22:18
  • 2
    @ABsintioLM imagine `s1` is 10 characters long, stored at address 1000. Now imagine you have another string `s` stored at address 1010. now, say you try to realloc `s1` to be 20 characters long. It can't use the same memory location it had before, because then it would overlap with `s`, so it needs to find memory somewhere else. – CoffeeTableEspresso May 23 '19 at 22:19
  • @CoffeeTableEspresso Okay, rigth. Thanks a lot, now I get it. – lmriccardo May 23 '19 at 22:26