2

Here's a basic example:

#include <all the basic stuff>

int main(void) {
    char *name = (char *) malloc(2 * sizeof(char));
    if(name == NULL) {
        fprintf(stderr, "Error: Unable to allocate enough memory!\n");
        return EXIT_FAILURE;
    }
    strcpy(name, "Bob Smith");
    printf("Name: %s\n", name);
    free(name);
    return EXIT_SUCCESS;
}

Because I'm only allocating 2 bytes of information (2 chars), there should be some sort of error when I perform strcpy, right? This doesn't happen, instead it just copies the string over, prints it out, frees the memory and exits successfully. Why does this happen, and how can I use malloc correctly?

Eli
  • 327
  • 4
  • 14
  • 8
    No; you get undefined behaviour, and one valid option for undefined behaviour is "it appears to work as expected" (which means "until it doesn't", which always happens at the most inconvenient time, such as when the boss and a big customer are watching the demo, or immediately after the product has been shipped to production). Since `malloc()` may have allocated a 16-byte space (of which you can only legitimately access 2 bytes), you may be all clean. Or the `free` may have corrupted your memory allocation system, but since you don't do any more, no-one notices. – Jonathan Leffler Sep 20 '15 at 18:07
  • 1
    That is undefined behavior. C does not hold your hand when it comes to memory access, etc. You want to overwrite a bunch of memory, no problem! You could use strncpy() to limit the #bytes copied. – OldProgrammer Sep 20 '15 at 18:08
  • Note that [`valgrind`](http://valgrind.org/) or almost any other memory abuse checker (even a debugging `malloc()` — check your man pages) will spot the problem and complain, legitimately. – Jonathan Leffler Sep 20 '15 at 18:11
  • 1
    "there should be some sort of error when I perform strcpy, right?" No. Because, as stated in the very first paragraph of strcpy man page: "the destination string dest must be large enough to receive the copy. Beware of buffer overruns! (See BUGS.)" In other words: *it's your responsibility*. You'd better use strncpy. – jbm Sep 20 '15 at 18:12
  • 2
    Two people suggested using `strncpy()`; I object to that suggestion because `strncpy()` has two bad behaviours for most people. First, and by far the most important, especially in this context, is that it does not null-terminate strings when it doesn't copy the whole string. The second is that if you have a big target space (say 2 KiB) and you copy a small string into that (`strncpy(space, 2048, "hi")`), then `strncpy()` will write 2046 null bytes, not just the one that's needed to terminate the string. Only use `strncpy()` if you're aware of these issues and can live with the consequences. – Jonathan Leffler Sep 20 '15 at 18:22
  • 1
    And again, don't cast `malloc()` return value! – Cong Ma Sep 20 '15 at 18:53
  • strcpy() does not check that the receiving buffer is large enough to contain the sending buffer. You have to make that check before calling strcpy(), if your really bothered about having to code carefully, then you could use: `size_t strlcpy(char *dest, const char *src, size_t size); ` – user3629249 Sep 21 '15 at 21:18

7 Answers7

5

Your program invokes undefined behavior.

Undefined behavior is behavior that's out of the specification of the language. By definition, this means that you are not guaranteed to get any kind of well-defined behavior (such as an error). The program is explicitly invalid.

When you use strcpy, the function simply assumes that the buffer you pass to it is large enough to hold the string you want to copy. If the assumption is wrong, it attempts to write in an area out of the buffer. And if this happens, the program falls into this case of the C specification, in J.2 Undefined behavior:

The behavior is undefined in the following circumstances:

  • Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that does not point into, or just beyond, the same array object

Thus, to use strcpy correctly, you must manually ensure that the above assumptions about the string's length and the buffer's length hold. To do that, an easy way is to keep the buffer's length saved somewhere, calculate the length of the string you want to copy, and compare them.

For example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
    size_t bufferSize = 2 * sizeof(char);       
    char *name = malloc(bufferSize);
    if(name == NULL) {
        fprintf(stderr, "Error: Unable to allocate enough memory!\n");
        return EXIT_FAILURE;
    }
    size_t length = strlen("Bob Smith");
    if(length + 1 > bufferSize) {
        fprintf(stderr, "Error: The target buffer is too small!\n");
        return EXIT_FAILURE;
    }
    strcpy(name, "Bob Smith");
    printf("Name: %s\n", name);
    free(name);
    return EXIT_SUCCESS;
}

As an unrelated side note, you'll notice that I didn't cast the result of malloc, because a void* is implicitly convertible to char*.


As a final note:

This aspect of C may sound impractical when you're trying to ensure the correctness of your code (either because you're learning the language or because you intend to release the software).

This is why there are some tools that offer to give you an error whenever your program does something invalid. Valgrind is one such tool (as was mentioned by Jonathan Leffler in the comments).

Community
  • 1
  • 1
Theodoros Chatzigiannakis
  • 28,773
  • 8
  • 68
  • 104
  • I like this answer, but why would I need *name in the first place if *strData already has all the information I need? – Eli Sep 20 '15 at 18:33
  • @DonkeyCore One reason is to demonstrate the usage of `malloc` and `strcpy`. If I just did `char *name = "Bob Smith"` I would have missed the point of your question, wouldn't I? Second, in my code, `strData` points to a read-only string, so it's not quite the same as `name`. For example, attempting to write to a read-only string is also undefined behavior. Anyway, I've changed it back to the way you had it, to avoid confusion. – Theodoros Chatzigiannakis Sep 20 '15 at 18:38
  • It's fine, I was just wondering about the point of having these dynamic memory functions when we could just do `char name[] = getInput()`. – Eli Sep 20 '15 at 18:41
  • @DonkeyCore I see. It's simply because you can only solve a limited scope of problems without using dynamic memory (and writable buffers). – Theodoros Chatzigiannakis Sep 20 '15 at 18:53
  • Alrighty, thanks for your time, help, references, and great example :) – Eli Sep 20 '15 at 19:04
4
  1. malloc will return null if it failed to allocate memory e.g. your system is out of memory. That's unlikely for 2 bytes!
  2. If you copy more bytes than you allocated, you get undefined behaviour. And that undefined behaviour may be that your program behaves as expected!
  3. As a more general note on the "correct" use of malloc that you ask about, I would recommend char *name = malloc(2 * sizeof(*name));. It's more concise, it will not hide an error if you forget to include stdlib.h and it will be easier to change the type of name in the future.
  4. Regarding safe use of strcpy you shouldn't replace it with strncpy as it's unsafe itself if the buffer is not big enough (not null-terminating) and potentially inefficient. Check if your system has strcpy_s or strlcpy.
Manos Nikolaidis
  • 21,608
  • 12
  • 74
  • 82
4

You get an error report if you compile and run it with AddressSanitizer:

$ gcc -g a.c -Wall -Wextra -fsanitize=address
$ ./a.out
=================================================================
==3362==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000eff2 at pc 0x7f9ff2b02dc4 bp 0x7fffe9190650 sp 0x7fffe918fdf8
WRITE of size 10 at 0x60200000eff2 thread T0
    #0 0x7f9ff2b02dc3 in __asan_memcpy (/lib64/libasan.so.2+0x8cdc3)
    #1 0x4009df in main /home/m/a.c:11
    #2 0x7f9ff26d678f in __libc_start_main (/lib64/libc.so.6+0x2078f)
    #3 0x400898 in _start (/home/m/a.out+0x400898)

0x60200000eff2 is located 0 bytes to the right of 2-byte region [0x60200000eff0,0x60200000eff2)
allocated by thread T0 here:
    #0 0x7f9ff2b0ea0a in malloc (/lib64/libasan.so.2+0x98a0a)
    #1 0x400977 in main /home/m/aa.c:6
    #2 0x7f9ff26d678f in __libc_start_main (/lib64/libc.so.6+0x2078f)

SUMMARY: AddressSanitizer: heap-buffer-overflow ??:0 __asan_memcpy
4566976
  • 2,419
  • 1
  • 10
  • 14
2

There are enough answers, I'll try to get them to a more basic level and give you the following as the root cause:

C does not include any bounds checking whatsoever.

The benefit is, the C runtime is very small and efficient. The backdraw is, you often will not get any error messages for things like you did in your question ... just (maybe long after the error itself) wrong behaviour or even crashes.

2

Why can you write more data to a buffer you malloc() than the size of the buffer? Outside of the fact that you can't predict the results of undefined behavior, there actually is an explanation as to why it sometimes appears to be totally safe to write more bytes to a malloc()'d buffer than the number of bytes you asked for.

It's because of the implications of the requirements set in Section 7.20.3 Memory management functions of the C standard:

The order and contiguity of storage allocated by successive calls to the calloc, malloc,and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).

Note the italicized text: "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object".

Those alignment restrictions mean that malloc() and related functions effectively have to hand out memory in aligned chunks, and any successful call to malloc() will be very likely to actually return memory that's in exact multiples of the alignment restrictions malloc() is operating under.

On an x86 machine, which IIRC has an 8-byte alignment restriction, a call such as malloc( 11 ) will likely return a pointer to a buffer that's in fact 16 bytes.

That's one reason why overwriting the end of a malloc()'d buffer sometimes seems to be harmless.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
0

Well strcpy(name, "Bob Smith"); will invoke undefined behaviour. name is not enough to store "Bob Smith". Solution would be -

 char a[]="Bob Smith";
 char *name = malloc(strlen(a)+1);   //you should not cast return of malloc
 if(name == NULL) { 
    fprintf(stderr, "Error: Unable to allocate enough memory!\n");
    return EXIT_FAILURE;
 }
strncpy(name,a,strlen(a));
ameyCU
  • 16,489
  • 2
  • 26
  • 41
  • In the example you showed me, why would *name be needed at all then, if `a` has all the information that's needed? – Eli Sep 20 '15 at 18:31
  • @DonkeyCore What you used is hard coded way . Then why even you need `strcpy` directly use `fgets` . – ameyCU Sep 20 '15 at 18:32
  • @DonkeyCore How did you get string `"Bob Smith"` ? – ameyCU Sep 20 '15 at 18:34
  • Well I put Bob Smith there as an example. It could have come from anywhere. I'm just wondering why I would need to make a `*name` variable, since `a[]`, the input in your example, already has the data. – Eli Sep 20 '15 at 18:38
  • @DonkeyCore I don't know what do you mean by anywhere . But if it is from input . then directly use `fgets` to read it into `name` . – ameyCU Sep 21 '15 at 03:29
-4

try:

strncpy(name, "Bob Smith", 2 * sizeof(char));
AlbertFG
  • 157
  • 13