1

While writing c code, I tried to write strcpy code of my own, and I faced this issue.

#include <stdio.h>
#include <string.h>

void strcpy2(char *s, char *t);

int main() {
    char a[10] = "asds";
    char b[10] = "1234567890";

    strcpy2(a, b);
    printf("Copy completed! : %s", a);
    return 0;
}

void strcpy2(char *s, char *t) {
    while ((*s++ = *t++));
}

Error code : Process finished with exit code -1073741819 (0xC0000005)

Thanks to this question on the s.o, I learned string should ends with '\0', but why the above code doesn't work even though it does not cause error when it is declared? (It worked well when char b[10] = "123456789")

So, How exactly '\0' affects this process and eventually cause the error? (Runtime? Compile time? etc) (I only know '\0' should be the end of the string)

Hashnut
  • 367
  • 3
  • 18
  • 1
    As an exercise, try to implement safe strcpy with char count - it should stop when number exceed n (10 in your case) - strnspy (https://en.cppreference.com/w/cpp/string/byte/strncpy). – Spock77 Sep 18 '18 at 11:25
  • 1
    `b` array is shorter with 1 byte to hold `'\0'`. Make it `char b[11]`. – i486 Sep 18 '18 at 11:25
  • 1
    Your `s` and `t` names are **highly** misleading. Please reconsider. The image you have posted is totally irrelevant. You should not post any images of code or error messages, [see here](http://idownvotedbecau.se/imageofanexception/). – n. m. could be an AI Sep 18 '18 at 11:30
  • "How exactly '\0' affects this process?" - you move at runtime over the *t memory searching for 0 and because there is no 0, go over the allowed region and cause access violation. Step by step instruction: https://medium.com/@larissafeng/understanding-while-s-t-abb2cc518f96 – Spock77 Sep 18 '18 at 11:37
  • @n.m. Thanks for the advice! – Hashnut Sep 18 '18 at 12:03
  • 1
    I must apologise about the `s, t` thing. It's not wrong per se. Lots of people write `strcpy`-like functions with *s=*t` or equivalent, inkluding K&R. It just drives me nuts. People, `s` is supposed to stand for "source" and `t` for "target"! If you must have one-letter variable names, and you must have the letters in alphabetical order, use `p, q` or something! Better yet, use more sensible names like `dst` and `src`, or longer. – n. m. could be an AI Sep 18 '18 at 14:46

2 Answers2

7

On the line char b[10] = "1234567890";, the string literal "1234567890" is exactly 10 characters + 1 null terminator. There is no room left in the array, so it doesn't get null terminated.

Normally, the compiler would warn you for providing an initializer which is too large, but this specific case is a very special pitfall. In the C standard's rules for initialization, we find this little evil rule (C17 6.7.9 §14, emphasis mine):

An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

There is no room in your case, so you don't get a null character. And because of this weird little rule, the compiler doesn't warn against it either, because the code conforms to the C standard.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    @Bathsheba Oh. Thanks, I had forgotten about this whole rep thing! :) As for this particular quirk, I actually didn't know about it myself until some years ago when I learnt it from SO [here](https://stackoverflow.com/questions/31296727/inconsistent-gcc-diagnostic-for-string-initialization). – Lundin Sep 18 '18 at 11:32
  • @Lundin I have a problem. I compile OPs code with: `-Wpedantic -std=c17 -Wall -Wextra -Werror -Wstrict-prototypes -Wmissing-prototypes -Wmisleading-indentation -Wduplicated-cond -Wold-style-definition -Wconversion -Wshadow -Winit-self -Wfloat-equal -Wwrite-strings -Wcast-align=strict -O0 -g -Wformat` and valgrind Does not catch this problem. Even with `O0, O1, O2 and O3` Why? [This is the Output from valgrind](https://pastebin.com/raw/535Wfx8E). I use`GCC-8.0.1` – Michi Sep 18 '18 at 11:36
  • Wow you are god of C! It helped a lot – Hashnut Sep 18 '18 at 11:38
  • @Michi Because as I said, the code conforms to the C standard, so gcc doesn't give an error on purpose. Neither does clang or icc, just tested with max warnings (various embedded systems compilers do, however). I don't believe a C++ compiler would allow it though, g++ gives an error. Valgrind isn't likely to catch it since it isn't a memory leak. This evil little rule likely goes all the way back to the old Unix string format, that didn't use null terminators. Pre-standard C used to allow such strings too, before C settled and was standardized. – Lundin Sep 18 '18 at 11:45
  • @Lundin OK, but this is a major problem. What happens with `a` in the `printf()` call? Does `a` get null terminated? If yes why there are still 10 chars? – Michi Sep 18 '18 at 11:47
  • @Michi You get undefined behavior. If you by luck have a byte with value 0 (ressembling a null terminator) after the string in memory, it will work as expected. If not, printf might print trailing garbage or crash. Or you get some access violation error from the OS. Anything can happen. – Lundin Sep 18 '18 at 11:49
  • @Lundin The OPs code is clearly `UB`. and it should be catch by compiler or Valgrind. I feel sad about this.4 – Michi Sep 18 '18 at 11:50
  • @Lundin `a` is not null terminated , so the `printf` call should be a problem for `valgrind` – Michi Sep 18 '18 at 11:52
  • `As for this particular quirk, I actually didn't know about it myself until some years ago when I learnt it from SO ` You meant to say even I can gain good amount of knowledge answering on SO? as of now I have very little knowledge about `c`. @Lundin – KBlr Sep 18 '18 at 11:54
  • @Michi Valgrind does catch this. OP's code reports "Invalid write of size 1" within `strcpy2`. If I comment out the `strcpy2` call, it completes with no errors (since the original contents of `a` _are_ nul-terminated). If I additionally change the `printf` call to print the contents of `b`, I get "Conditional jump or move depends on uninitialized value" from inside the guts of `printf`. – zwol Sep 18 '18 at 13:54
  • @zwol Yes, I know that, but the actual OPs code it is not seen by the Compiler or Valgrind. This is a big problem. [A code like this should never be treated as a valid](https://ideone.com/e9IxU8). On my system ( linux mint 19, Gcc-8.0.1, valgrind-3.13.0) It is possible and it is treated as a valid code [whit all optimization levels Turned ON](https://pastebin.com/raw/R9Q1LD1H) – Michi Sep 18 '18 at 14:15
4

char b[10] = "1234567890"; doesn't contain a NUL-terminator so

while ((*s++ = *t++));

does not terminate correctly (formally the program behaviour is undefined). Note that the constant "1234567890" is a char[11] type; the compiler allows you to assign it to a smaller array, with elements removed automatically.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483