-1

I'm new to programming and started to learn c few weeks back. I have read in a book that character array should end with \0, but when i create an array without \0, it works properly. How is it possible.?

#include<stdio.h>
#include<string.h>
int main()
{
    char a[] = {'a','p','p','l','e'};

    printf("%d\n",strlen(a));
    printf("%s\n",a);
    return 0;
}

The above code output is

5
apple

Also I read that char is a subset of integer datatype, but when i create the above array with int datatype, it doesn't work properly.

#include<stdio.h>
#include<string.h>
int main()
{
    int a[] = {'a','p','p','l','e'};

    printf("%d\n",strlen(a));
    printf("%s\n",a);
    return 0;
}

The above code output is

1
a

Why it consider only the first element of the array?

MarianD
  • 13,096
  • 12
  • 42
  • 54
Vencat
  • 1,272
  • 11
  • 36
  • Second example is incorrect. Because char is one byte int usually 4 bytes. – purec Sep 08 '18 at 11:59
  • 1
    C strings must be null-terminated. Your code invokes UB – phuclv Sep 08 '18 at 12:00
  • 1
    @Vencat Are you getting any warning message(s) when compiling second example? – H.S. Sep 08 '18 at 12:01
  • 1
    Possible duplicate of [String is longer than expected in C](https://stackoverflow.com/questions/33707486/string-is-longer-than-expected-in-c) – phuclv Sep 08 '18 at 12:02
  • 1
    @purec there's no such thing as "default state of memory". [Using an uninitialized variable invokes undefined behavior](https://stackoverflow.com/q/11962457/995714) – phuclv Sep 08 '18 at 12:14
  • Why? When you turn on your computer. – purec Sep 08 '18 at 12:15
  • other duplicates: [What happened when we do not include '\0' at the end of string in C?](https://stackoverflow.com/q/34995106/995714), [why printf works on non-terminated string](https://stackoverflow.com/q/4999901/995714). @purec turning on the PC != starting your process. There are tons of other things stored at that memory region before you store your variables there. Did you even bother to read the link? – phuclv Sep 08 '18 at 12:17
  • phuclv, there is no even such thing as process at this stage. – purec Sep 08 '18 at 12:23
  • @purec your C program is a process in an OS. Where do you think stdio writes to without a console? Read more and learn more. Even if you write a firmware like BIOS that starts before anything then there may be static charges on memory, which results in random values and programs must zero them out explicitly. And learn how to use this site, without `@` there would be no notification – phuclv Sep 08 '18 at 12:25
  • @phuclv In the above question he got an error while running his code without '\0' , but in my case i didn't get any error. – Vencat Sep 08 '18 at 12:27
  • I am just answering his question "How it is possible?" And I am saying that unused memory holds zeros. I am not a pastor to curse him with UB or other scary things. – purec Sep 08 '18 at 13:18
  • @Vencat UB means anything can happen, including [*"making demons fly out of your nose"*](http://catb.org/jargon/html/N/nasal-demons.html). See the questions I linked above. [Undefined, unspecified and implementation-defined behavior](https://stackoverflow.com/q/2397984/995714) – phuclv Sep 08 '18 at 14:08

6 Answers6

2

The first half of your question is equivalent to this:

I'm new to life and started to learn about road traffic a few weeks back. I have read in a book that you should wait for the green light before entering the intersection, but when I enter the intersection without waiting, it works properly. How is it possible?

In other words, you just got lucky. It just so happened that, even though you constructed an array of characters without a proper \0 terminator, there happened to be a 0 byte in memory just after the e in apple, so it worked anyway. But it's not at all guaranteed to work, any more than it's guaranteed that you can keep crossing the street against the light and not, eventually, get hit.

Moving on to your second question, when you read that "char is a subset of integer datatype", that does not at all mean that anywhere you would ordinarily use a char, you can also use int.

Here are some characters in memory. Each of them is one byte in size:

char c1 = 'p', c1 = 'e', c3 = 'a', c4 = 'r';

    +---+                   +---+
c1: | p |               c2: | e |
    +---+                   +---+

    +---+                   +---+
c3: | a |               c4: | r |
    +---+                   +---+

Here are some ints in memory. On a modern machine, each of them is probably four bytes in size:

int i1 = 'p', i1 = 'e', i3 = 'a', i4 = 'r';

    +---+---+---+---+       +---+---+---+---+
i1: | p             |   i2: | e             |
    +---+---+---+---+       +---+---+---+---+

    +---+---+---+---+       +---+---+---+---+
i3: | a             |   i4: | r             |
    +---+---+---+---+       +---+---+---+---+

Here is an array of char, properly null-terminated:

char ca[] = { 'p', 'e', 'a', 'r', '\0' };

    +---+---+---+---+---+
ca: | p | e | a | r |\0 |
    +---+---+---+---+---+

When printf prints this string, or strlen computes its length, they start at the beginning and move along the string one byte at a time, until they find the \0.

But here is an array of int:

int ia[] = { 'p', 'e', 'a', 'r', '\0' };

    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
ia: | p             | e             | a             | r             | \0            |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

But I've drawn it slightly wrongly, because in reality, the three extra bytes in each int aren't filled with empty spaces, they're filled with zero bytes. (It's as if we want to represent the number 1 with leading zeroes, that is, as 0001.) So the more accurate picture looks like this;

    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
ia: | p  \0  \0  \0 | e  \0  \0  \0 | a  \0  \0  \0 | r  \0  \0  \0 | \0  \0  \0  \0|
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

So when printf or strlen start at the beginning and process the array one byte at a time looking for the terminating \0, they find one immediately, just after the first letter.

An important point to consider here is that printf and strlen are defined to operate on arrays of char. And because of the way C works, they had no way of knowing that you had cheated and passed an array of int instead. They literally took that same memory and treated it as if it were an array of char, and so got a very different result than what you expected.

Because it's easy to make mistakes like this, good compilers will warn you if you do. For your code, my compiler gave me these warnings:

warning: incompatible pointer types passing 'int [5]' to parameter of type 'const char *'
warning: format specifies type 'char *' but the argument has type 'int *'

Those messages refer to type char *, which is pointer-to-char, because when you pass an array to a function, what actually gets passed is a pointer to the array's first element. (But that's a topic for another day. But it has a lot to do with what I said about printf and strlen "literally taking that same memory and treated it as if" it were an array of characters, instead.)

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • Good point. I fully oversaw that an array of 5 char's lacks a terminator what in turn is stricly necessary for `strlen()`. – Scheff's Cat Sep 08 '18 at 12:14
2

In char a[] = {'a','p','p','l','e'};, the compiler counts the number of values you provide, which is five. Then it creates an array of five char and initializes them with those values.

Then, in printf("%d\n",strlen(a)); and in printf("%s\n",a);, the behavior is not defined by the C standard because you are required to have a zero element in the array to indicate where the end is. In the situation where you tried this, it may have happened that the memory after the a array contained a zero, resulting in the program printing “5” and “apple”. However, this will not always happen.

Additionally, the result of strlen has type size_t and ought to be printed with %zu rather than %d.

In int a[] = {'a','p','p','l','e'};, the compiler creates an array of int. When you use this in printf("%s\n",a);, you are passing a pointer to int when printf expects a pointer to char. The behavior of this is not defined by the C standard. A common result is that printf will process the bytes in the array of int as if they were an array of char, although this cannot be relied on—the actual behavior of C implementations may vary.

Since int are wider than char, an int containing the value a typically contains one byte with the value a and one or more bytes with the value zero. It may also contain padding bits. The order of the bytes within an int is not defined by the C standard. If the byte containing a happens to be first in memory, and the following bytes are zero, printf may print “a”. However, if a byte containing zero is first, printf will see that as the end of the string and will print nothing.

Again, the behavior is not defined by the C standard. The above only explains how what you saw may have come to be printed, not what you can expect in other situations.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

Passing an int[] to strlen() is wrong. strlen() expects characters. Even if you provide something else (and switch off or ignore all alerting warnings of compiler), strlen() interpretes the given address as char* (whatever it really contains).

To be strict, this is undefined behavior.

Investigating a little bit, we can explore what probably happens:

char a[] = {'a','p','p','l','e'};

defines an array of 5 characters. Dumped from memory this might look so:

0x61 0x70 0x70 0x6c 0x65 ???? ???? ????

int a[] = {'a','p','p','l','e'};, assuming 32 bit int, little endian, this might look so:

0x61 0x00 0x00 0x00 0x70 0x00 0x00 0x00
0x70 0x00 0x00 0x00 0x6c 0x00 0x00 0x00
0x65 0x00 0x00 0x00 ???? ???? ???? ????

Re-interpreting a[] as char* (what strlen() would do), this results in a string of length one.

However, it's still undefined behavior...

Scheff's Cat
  • 19,528
  • 6
  • 28
  • 56
1

Depends in the hardware and the implementation the int can be 2+ bytes long.

On the little endian system the first byte will be the ASCII code of 'a' and the second byte (and the consecutive up to sizeof(int)) zero. So the any string functions will consider it as single character string.

Big endian system will have the opposite byte order and if we interpret this int arrar as a char array the first character will be zero which terminates the string and the length of it will be zero.

Your second example is wrong as you do not have a terminating zero and using it as a sting invokes the UB.

your char table initialization should be:

char a[] = {'a','p','p','l','e', 0};

or

char a[] = "apple";

as string literal initialization adds the terminating nul as well.

0___________
  • 60,014
  • 4
  • 34
  • 74
0

On 32-bit compiler int takes 4 bytes and char takes 1 byte. If you pass integer array to strlen, it scan first byte from integer which is a in your case, next 3 bytes are 0, hence strlen stops at second byte and shows length as 1.

Mayur
  • 2,583
  • 16
  • 28
  • The widths of `int` and other types are more complicated than just whether a compiler is “32-bit”, “64-bit” or something else. One should not expect that a “32-bit compiler” necessarily has particular widths for its various integer types. – Eric Postpischil Sep 08 '18 at 12:13
0

I have read in a book that character array should end with \0...

It is required only when you want to interpret the character array as string. In C language, strings are actually one-dimensional array of characters terminated by a null character \0.

In your first example, the char array a is simply array of characters. You are lucky that strlen and printf have given the expected output. The strlen function returns the number of characters that precede the terminating null character. In you case the memory just after the array a must be 0. Hence, you are getting the expected output from strlen. For the same reason the printf is also working as expected because it writes every byte up to and not including the first null terminator.

In your second example, you are passing an integer pointer to strlen:

printf("%d\n",strlen(a));

Compiler must be giving warning message on it because the parameter type of strlen is const char * and you are passing it int *.

Also, in the printf you are giving argument as integer pointer. The %s format specifier expect a char pointer. The behavior is undefined in this case.

H.S.
  • 11,654
  • 2
  • 15
  • 32