My understanding is that the ASCII characters found in the range from 0x00 to 0x1f were included with Teletype machines in mind. In the modern era, many of them have become obsolete. I was curious as to which characters might still be found in a conventional string or file. From my experience programming in C, I thought those might be NUL, LF, TAB, and maybe EOT. I'm especially curious about BS and ESC, as I thought (similar to shift or control maybe) that those might be handled by the OS and never really printed or be included in a string. Any amount of insight would be appreciated!
2 Answers
Out of the characters between hexadecimal 00 and 1F, the only ones you are likely to encounter frequently are NUL (0x00 = \0
), TAB (0x09 = \t
), CR (0x0D = \r
), and LF (0x0A = \n
). Of these, NUL is used in C-like languages as a string terminator, TAB is used as a tab character, and CR and LF are used at the end of a line. (Which one is used is a complicated situation; see the Wikipedia article Newline for details, including a history of how this came to be.)
The following additional characters are used when communicating with VT100-compatible terminal emulators, but are rarely found outside that context:
- BEL (
0x07 = \a
), which causes a terminal to beep and/or flash. - BS (
0x08 = \b
), which is used to move the cursor left one position. (It is not sent when you press the backspace key; see below!) - SO and SI (
0x0E
and0x0F
), which are used to switch into certain special character sets. - ESC (
0x1B = \e
), which is sent when pressing the Escape key and various other function keys, and is additionally used to introduce escape sequences which control the terminal. - DEL (
0x7F
), which is sent when you press the backspace key.
The rest of the nonprintable ASCII characters are essentially unused.
-
This is exactly all the information I was looking for. Thank you! – May 26 '16 at 04:13
-
FF (`0x0C = \f`) is used as a page separator. – dan04 May 26 '16 at 04:19
-
@dan04 I left that out intentionally, as it's pretty rare in modern usage. – May 26 '16 at 06:59
-
page separator is still used in several editors as a way to mark page breaks https://www.gnu.org/software/emacs/manual/html_node/emacs/Pages.html – Prgrm.celeritas Aug 20 '20 at 16:41
"Backspace composition no longer works with typical modern digital displays or typesetting systems" Ref Backspace
Here's a related question: The backspace escape character in c unexpected behavior
Ref Unicode
Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats, called UTF-8, UTF-16 and UTF-32).
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1 (Latin 1) characters are assigned Unicode/UCS code points that are the same as their codes in the earlier standards. Therefore, ASCII can be considered a 7-bit encoding scheme for a very small subset of Unicode/UCS, and ASCII (when prefixed with 0 as the eighth bit) is valid UTF-8.
Here's another Unicode using backspace what is the purpose of Unicode backspace u0008
Here's a good overview of c programming how to program for unicode and UTF-8
And finally here's (FSF.org) GNU implementation GNU libunistring manual
"This library provides functions for manipulating Unicode strings and for manipulating C strings according to the Unicode standard."

- 19,824
- 17
- 99
- 186