2

Debugging the following code in GDB:

char* input = malloc(2);
input[0] = 14;
input[1] = 0;

The value of the string in memory according to GDB is:

input = "\016"

Likewise,

char* input = malloc(2);
input[0] = 16;
input[1] = 0;

input in memory is "\020".

Why is this the case? Why doesn't ASCII value 14 map to char \016? Then, why does ASCII value 16 map to \020 in memory?

EDIT: To add further confusion, using the following code:

char* input = malloc(2);
input[0] = 20;
input[1] = 0;

By running the above code segment in gdb and using the following command:

p input

the resulting value that is printed is:

$1 = 0x604010 "\020"

This leads me to believe that value of string input is "\020".

Two ASCII numbers map to the same contents "\020" (namely 16 and 20).

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
Peabrain
  • 519
  • 2
  • 12
  • confusingly, "\020" is used twice. One to represent 16 in octal (as you said) but also it seems that ASCII value 20 is also represented as "\020". – Peabrain Feb 07 '21 at 22:13
  • 2
    @Peabrain `input is the string "\020"` That's not the same as `input[0] = 20;` which you wrote next. `it seems that ASCII value 20 is also represented as "\020"` No, it's not. Not sure what makes you think so, since you posted no real code, and no step-by-step of what you are actually doing. – dxiv Feb 07 '21 at 22:16
  • just copy and paste the last code segment then debug it in GDB. I am getting the value "\020" when you use the command "p input". I add that it in for further clarification. – Peabrain Feb 07 '21 at 22:22
  • @Peabrain ["\020" = { 16, 0 }](https://onlinegdb.com/ryYjKJ0xu). – dxiv Feb 07 '21 at 22:30
  • @Peabrain: do you paste C code into GDB? or did you change the source code and run GDB again on the executable **without** recompiling it? This would explain the surprising behavior reported... – chqrlie Feb 07 '21 at 22:43

2 Answers2

2

14 is written 016 in octal (base eight). The syntax '\016' uses octal for historical reasons, namely antique computers from the 60s that had 6-bit chars crammed in 12-bit, 18-bit or even 36-bit words for which octal digits seemed a perfect representation for groups of 3 bits.

Traces of this can be found in the C syntax for character and string constants (borrowed from C by many languages) and the permission flags in Unix file systems (eg: chmod and umask arguments).

16 is '\020', 20 is '\024' and 32 (ASCII space) is '\040' or '\x20'.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • The C literal string octal sequence dates from 1972 on PDP-11 computers which used 8 bit characters. Earlier in DEC's history, there were other rather strange word sizes, but few of them had much effect on later designs. – wallyk Feb 07 '21 at 22:29
  • @wallyk: the PDP-11 was **the** modern machine of its time, but the Unix team at Bell Labs used PDP-8 machines before that, with a 12-bit address space. See [this answer](https://stackoverflow.com/a/13960625/4593267). – chqrlie Feb 07 '21 at 22:35
1

This record \016 represents an octal escape sequence. Its value in the decimal notation is equal to 14.

From the C Standard (6.4.4.4 Character constants)

octal-escape-sequence:
    \ octal-digit
    \ octal-digit octal-digit
    \ octal-digit octal-digit octal-digit

The compiler could represent the value also as a hexadecimal escape sequence like \xE.

This octal escape sequence \020 represents the decimal number 16. The number 20 written as an octal escape sequence will look like \024.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335