1

I've been thinking of what will happen if I assign a longer string literal to a char array of smaller size. (I understand that if I use a string literal as an initializer, I would probably leave out the size and let the compiler count the number of chars, or use strlen()+1 as the size. )

I have the following code:

#include <stdio.h>

int main() {
    char a[3] = "abc"; // a[2] gives an error of initializer-string for array of chars is too long
    printf("%s\n", a);
    printf("%p\n", a);
}

I expect it to crash but it actually compiles without warning and can print things out. But using valgrind, I get the following error messages.

==19195== Memcheck, a memory error detector
==19195== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==19195== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==19195== Command: ./a.out
==19195== 
==19195== Conditional jump or move depends on uninitialised value(s)
==19195==    at 0x4E88CC0: vfprintf (vfprintf.c:1632)
==19195==    by 0x4E8F898: printf (printf.c:33)
==19195==    by 0x4005CC: main (main.c:5)
==19195== 
==19195== Conditional jump or move depends on uninitialised value(s)
==19195==    at 0x4EB475D: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:850)
==19195==    by 0x4EB56AF: _IO_default_xsputn (genops.c:455)
==19195==    by 0x4EB32C6: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1352)
==19195==    by 0x4E8850A: vfprintf (vfprintf.c:1632)
==19195==    by 0x4E8F898: printf (printf.c:33)
==19195==    by 0x4005CC: main (main.c:5)
==19195== 
==19195== Conditional jump or move depends on uninitialised value(s)
==19195==    at 0x4EB478A: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:858)
==19195==    by 0x4EB56AF: _IO_default_xsputn (genops.c:455)
==19195==    by 0x4EB32C6: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1352)
==19195==    by 0x4E8850A: vfprintf (vfprintf.c:1632)
==19195==    by 0x4E8F898: printf (printf.c:33)
==19195==    by 0x4005CC: main (main.c:5)
==19195== 
==19195== Conditional jump or move depends on uninitialised value(s)
==19195==    at 0x4EB56B3: _IO_default_xsputn (genops.c:455)
==19195==    by 0x4EB32C6: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1352)
==19195==    by 0x4E8850A: vfprintf (vfprintf.c:1632)
==19195==    by 0x4E8F898: printf (printf.c:33)
==19195==    by 0x4005CC: main (main.c:5)
==19195== 
==19195== Syscall param write(buf) points to uninitialised byte(s)
==19195==    at 0x4F306E0: __write_nocancel (syscall-template.S:84)
==19195==    by 0x4EB2BFE: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1263)
==19195==    by 0x4EB4408: new_do_write (fileops.c:518)
==19195==    by 0x4EB4408: _IO_do_write@@GLIBC_2.2.5 (fileops.c:494)
==19195==    by 0x4EB347C: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1331)
==19195==    by 0x4E8792C: vfprintf (vfprintf.c:1663)
==19195==    by 0x4E8F898: printf (printf.c:33)
==19195==    by 0x4005CC: main (main.c:5)
==19195==  Address 0x5203043 is 3 bytes inside a block of size 1,024 alloc'd
==19195==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==19195==    by 0x4EA71D4: _IO_file_doallocate (filedoalloc.c:127)
==19195==    by 0x4EB5593: _IO_doallocbuf (genops.c:398)
==19195==    by 0x4EB48F7: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:820)
==19195==    by 0x4EB328C: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1331)
==19195==    by 0x4E8850A: vfprintf (vfprintf.c:1632)
==19195==    by 0x4E8F898: printf (printf.c:33)
==19195==    by 0x4005CC: main (main.c:5)
==19195== 
abc?
0xfff0003f0
==19195== 
==19195== HEAP SUMMARY:
==19195==     in use at exit: 0 bytes in 0 blocks
==19195==   total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==19195== 
==19195== All heap blocks were freed -- no leaks are possible
==19195== 
==19195== For counts of detected and suppressed errors, rerun with: -v
==19195== Use --track-origins=yes to see where uninitialised values come from
==19195== ERROR SUMMARY: 10 errors from 5 contexts (suppressed: 0 from 0)

I think the uninitialized value/byte part makes sense because there's no memory allocated for the terminating character '\0', and when I print it out the last char is garbage value.

But the last error message looks unfamiliar to me.

Address 0x5203043 is 3 bytes inside a block of size 1,024 alloc'd

I'm aware that the buffer size is defined as 1024. I'm not sure if this error is here because of inefficient use of memory.

Also I'm wondering where does the heap alloc and free come from? Is that from the string literal?

Thanks for any help!!

(The previous subject of this question might be confusingly worded. I changed it. )

A similar question, but in C++

Community
  • 1
  • 1
myx
  • 1,161
  • 1
  • 10
  • 17
  • 1
    The heap allocation is probably internal to `printf()`. – Barmar Mar 02 '17 at 16:09
  • Possible duplicate of [Which is better way to initialize array of characters using string literal?](http://stackoverflow.com/questions/42049003/which-is-better-way-to-initialize-array-of-characters-using-string-literal) – msc Mar 02 '17 at 16:11
  • @rsp I think he understands why this isn't the correct way to initialize the literal. He's specifically asking for an explanation of valgrind's warnings about it. – Barmar Mar 02 '17 at 16:12
  • `I expect it to crash` [UB doesn't mean crash](http://stackoverflow.com/q/32132574/995714) http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html – phuclv Mar 02 '17 at 16:46
  • @Barmar "isn't the correct way to initialize the literal." is unclear. The string literal `"abc"` does not need initialization and `char a[3] = "abc";` is compliant C. – chux - Reinstate Monica Mar 02 '17 at 16:51
  • @chux I meant "initialize the string". It's a valid array, but it won't work as a string because of the missing null delimiter. And he acknowledged that in the question. – Barmar Mar 02 '17 at 17:08
  • @Barmar Perhaps. OP's title "assigning string literal “abc” to an array of size 3 causes valgrind error" and "... if I assign a longer string literal to a char array of smaller size. I expect it to crash ..." led me to think OP thinks `char a[3] = "abc";` itself is a problem, and that "works" to initialize an array - no problem so far. I am certain we agree, the problem is in the `printf("%s\n", a);` for as you comment, `a[]` is not a _string_, as needed. – chux - Reinstate Monica Mar 02 '17 at 17:17
  • @chux i'm sorry that my wording confused you. I was just curious about the valgrind messages. – myx Mar 02 '17 at 19:06
  • @LưuVĩnhPhúc sorry by "crash" I meant to say it won't compile. But the other two links are helpful! – myx Mar 02 '17 at 19:15
  • There's absolutely nothing that makes it not compilable. All syntax-error-free code must be compiled successfully. Whether it runs successfully or not is another matter – phuclv Mar 03 '17 at 02:28

2 Answers2

3

assigning string literal “abc” to an array of size 3 causes valgrind error

The assigning does not cause a valgrind error. char a[3] = "abc" is fine. C allows a character array to be initialized sans null character.

Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array. C11 §6.7.9 14

printf("%s", ... expects a pointer to a null character terminated array. a is not that as it lacks a null character. Code is attempting to access beyond a[] and is undefined behavior, the error comes from that. It is not "inefficient use of memory.", but accessing out of bounds into uninitialized memory.

Instead use the following which prints until a null character is found or 3 characters is printed.

printf("%.3s\n", a);
// or 
printf("%.*s\n", (int) sizeof a, a);
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Nothing worse that downvotes without a reason - especially when the answer is correct. – KevinDTimm Mar 02 '17 at 16:38
  • @KevinDTimm Agree, a comment with a DV is more useful. Yet as I do not complain about UV without a reason, neither do I complain about DV without a reason. – chux - Reinstate Monica Mar 02 '17 at 16:47
  • The question is what the valgrind error messages mean, specifically `Address 0x5203043 is 3 bytes inside a block of size 1,024 alloc'd`. This answer is correct as far as C goes, but it does not attempt to answer the question. – melpomene Mar 02 '17 at 16:47
  • @melpomene The "Conditional jump or move depends on uninitialised value(s)" is due to the "accessing out of bounds into uninitialized memory." – chux - Reinstate Monica Mar 02 '17 at 16:48
  • 1
    @melpomene - The subject of the question is : "C: assigning string literal “abc” to an array of size 3 causes valgrind error". chux has verified this to not be true (and so is an answer to the question) – KevinDTimm Mar 02 '17 at 16:50
  • 1
    @KevinDTimm The subject is not the question itself, and it does cause a valgrind error, at least indirectly (because the code calls `printf` on it and tries to use the array as a string). OK, so the title is slightly inaccurate. But if you read the question to the end, you can see it's all about interpreting that last valgrind message. – melpomene Mar 02 '17 at 16:53
  • @melpomene OP's post is an approximation of OP's true question and is open to some interpretation. (Those who know the answer ask highly well formed questions, like teachers and lawyers.) So OP has at least 4 concerns ranging about `char a[3] = "abc"; printf("%s\n", a);` and the tools that report the issue. Our answers approach this from different, not incorrect, angles. – chux - Reinstate Monica Mar 02 '17 at 16:59
3

Here's my interpretation of what's going on:

You're writing to stdout, which is buffered by default. So all data goes into an internal buffer first and is then written ("flushed") to the actual underlying file descriptor.

Your a array is not a valid string, as it lacks a terminating NUL byte. The first couple of messages come from the printf internals where it tries to compute the length of the argument string by finding the terminator and copy the contents into stdout's buffer. As there is no terminator within a, the code goes out of bounds, reading uninitialized memory.

At this point the output buffer would look like:

char *buf = malloc(1024), contents:
a b c ? ? ? ?
^^^^^ ^^^^^^^

The first part (abc) was legitimately copied from a. The next part is random garbage (uninitialized bytes after a, copied into the buffer). This goes on until a NUL byte happens to occur somewhere after a, which is then treated as the end of the string (this is where copying from a stops).

Finally there's the '\n' from the format string, which is also added to the buffer:

char *buf = malloc(1024), contents:
a b c ? ? ? ? \n
^^^^^ ^^^^^^^ ^^

Then (because we encountered a '\n' and stdout is line buffered) we flush the buffer, calling write(STDOUT_FILENO, buf, N) where N is however many bytes are in use in the output buffer (this is at least 4 but the exact number depends on how many garbage bytes were copied before a '\0' was found after a).

Now, the error:

==19195== Syscall param write(buf) points to uninitialised byte(s)

This is saying that there are uninitialized bytes within the first argument of write (the buffer).

Apparently valgrind treats parts of the output buffer as uninitialized because the source data was uninitialized. Copying garbage from A to B just means B is also garbage.

==19195==  Address 0x5203043 is 3 bytes inside a block of size 1,024 alloc'd

So it's saying that there's a dynamically allocated buffer (of size 1024), and the uninitialised byte(s) from the previous error were found at offset 3. Which makes sense, because offsets 0, 1, 2 contain "abc", which is perfectly valid data. But after that is where the trouble begins.

It's also saying that the block came from malloc, which was called (indirectly) from printf. This is because the output buffer of stdout is created on demand, the first time you write to it. Which is the first printf call in your main.

melpomene
  • 84,125
  • 8
  • 85
  • 148