27

I'm doing some C puzzle questions. In most cases, I am able to find the right answer, but with that one I am having problems. I know the right answer by using the compiler, but I don't know the reason.

Have a look at the code:

char c[] = "abc\012\0x34";

What would strlen(c) return, using a Standard C compiler?

My compiler returns 4 when what I expected was 3.

What I thought is strlen() would search for the first occurrence of the NULL character but somehow the result is one more than I expected.

Any idea why?

Nathaniel Ford
  • 20,545
  • 20
  • 91
  • 102
Prz3m3k
  • 605
  • 6
  • 14

1 Answers1

60

Let's write

char c[] = "abc\012\0x34";

with single characters:

char c[] = { 'a', 'b', 'c', '\012', '\0', 'x', '3', '4', '\0' };

The first \0 you see is the start of an octal escape sequence \012 that extends over the following octal digits.

Octal escape sequences are specified in section 6.4.4.4 of the standard (N1570 draft):

octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit

they consist of a backslash followed by one, two, or three octal digits. In paragraph 7 of that section, the extent of octal and hexadecimal escape sequences is given:

7 Each octal or hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence.

Note that while the length of an octal escape sequence is limited to at most three octal digits (thus "\123456" consists of five characters, { '\123', '4', '5', '6', '\0' }), hexadecimal escape sequences have unlimited length

hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit

and thus "\x123456789abcdef" consists of only two characters ({ '\x123456789abcdef', '\0' }).

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • 4
    I'm sure you can find some comfort in your remaining seventy thousand rep. – cmh Jan 10 '13 at 18:14
  • 1
    @effeffe, This has the potential to end up as a gold votes badge. Is that good enough? – chris Jan 10 '13 at 18:15
  • 1
    @chris It sure looks like it can get me silver, but gold would be embarrassing. – Daniel Fischer Jan 10 '13 at 18:16
  • @DanielFischer, I haven't seen anything like this on SO. It fits a pretty good niche. – chris Jan 10 '13 at 18:18
  • If `\012` is octal then why `\0x34` is not hexadecimal **?** Didn't get. – Grijesh Chauhan Jun 14 '13 at 17:07
  • 2
    @GrijeshChauhan Because hexadecimal escape sequences start with `\x`, not `\0x`, a hexadecimal escape sequence would be `'\x34'`. In `"...\0x34..."`, the `'\0'` is an octal escape sequence, followed by the three ordinary characters `'x'`, `'3'` and `'4'`. I don't know why hexadecimal escape sequences start `\x` and not `\0x`, I suppose it's easier to parse if the type of escape sequence is determined immediately after the backslash. – Daniel Fischer Jun 14 '13 at 17:17
  • One might also be interested in [when did C++ compilers start considering more than two hex digits in string literal character escapes?](http://stackoverflow.com/q/5784969/459391). It seems to be about `g++`, but `gcc` produces the same warning as well. – Sadeq Dousti Sep 22 '16 at 15:13