If I can understand the rules (please correct me if I am wrong):
\ OctalDigit
Examples:
\0, \1, \2, \3, \4, \5, \6, \7
\ OctalDigit OctalDigit
Examples:
\00, \07, \17, \27, \37, \47, \57, \67, \77
\ ZeroToThree OctalDigit OctalDigit
Examples:
\000, \177, \277, \367,\377
\t
, \n
, \\
do not fall under OctalEscape rules; they must be under separate escape character rules.
Decimal 255 is equal to Octal 377 (use Windows Calculator in scientific mode to confirm)
Hence a three-digit Octal value falls in the range of \000
(0) to \377
(255)
Therefore, \4715
is not a valid octal value as it is more than three-octal-digits rule. If you want to access the code point character with decimal value 4715, use Unicode escape symbol \u
to represent the UTF-16 character \u126B
(4715 in decimal form) since every Java char
is in Unicode UTF-16.
from http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html:
The char data type (and therefore the value that a Character object
encapsulates) are based on the original Unicode specification, which
defined characters as fixed-width 16-bit entities. The Unicode
standard has since been changed to allow for characters whose
representation requires more than 16 bits. The range of legal code
points is now U+0000 to U+10FFFF, known as Unicode scalar value.
(Refer to the definition of the U+n notation in the Unicode standard.)
The set of characters from U+0000 to U+FFFF is sometimes referred to
as the Basic Multilingual Plane (BMP). Characters whose code points
are greater than U+FFFF are called supplementary characters. The Java
2 platform uses the UTF-16 representation in char arrays and in the
String and StringBuffer classes. In this representation, supplementary
characters are represented as a pair of char values, the first from
the high-surrogates range, (\uD800-\uDBFF), the second from the
low-surrogates range (\uDC00-\uDFFF).
Edited:
Anything that beyond the valid octal value of 8-bit range (larger than one byte) is language-specific. Some programming languages may carry on to match Unicode implementation; some may not (limit it to one byte). Java definitely does not allow it even though it has Unicode support.
A few programming languages (vendor-dependent) that limit to one-byte octal literals:
- Java (all vendors): - An octal integer constant that begins with 0 or single-digit in base-8 (up to 0377); \0 to \7, \00 to \77, \000 to \377 (in octal string literal format)
- C/C++ (Microsoft) - An octal integer constant that begins with 0 (up to 0377); octal string literal format
\nnn
- Ruby - An octal integer constant that begins with 0 (up to 0377); octal string literal format
\nnn
A few programming languages (vendor-dependent) that support larger-than-one-byte octal literals:
- Perl - An octal integer constant that begins with 0; octal string literal format
\nnn
See http://search.cpan.org/~jesse/perl-5.12.1/pod/perlrebackslash.pod#Octal_escapes
A few programming languages do not support octal literals:
- C# - use
Convert.ToInt32(integer, 8)
for base-8 How can we convert binary number into its octal number using c#?