7

I am testing something and calling the StringTokenizer and is getting some weird conversion... forget about the fact that I should be delimiting the \ in the "\7767546" but I'm just curious what's with the \11 until \77 in java

here is my code:

String path = "C:\\temp\\\\7800000\7767546.pdf";
String delimeter = "\\";
String[] values = new String[3];
int counter = 0;
StringTokenizer st = new StringTokenizer(path,delimeter); 
while(st.hasMoreTokens()){ 
           values[counter] = st.nextToken();
           System.out.println(" values[counter]" + values[counter]); 
           ++counter;
} 

here's the output:

values[counter]C:

values[counter]temp

values[counter]7800000?67546.pdf

if you notice, the \77 in my original String became ? .....is that like a unicode thing?

ayeen c
  • 71
  • 1
  • 5
  • 2
    you need to escape (i.e. double) `\\` in strings –  Mar 20 '14 at 20:52
  • yes, RC, thank you... i dont know if you read my disclaimer above to forget the fact that i should be delimiting the \ but im really more curious as to what it is converting to.. octal? ascii? – ayeen c Mar 21 '14 at 14:42

5 Answers5

7

"77" in ASCII is a "?". It appears that Java automatically converted the int into a char.

Here is a general fix that may work for you. It works for many different cases and many different programming languages. You can add another "\" before the 77. Most likely, the double back slash will be converted into a single backslash when Java processes the code.

Jake Chasan
  • 6,290
  • 9
  • 44
  • 90
  • @ajb Yes, just figured out that `\77` is interpreted as octal, which is a question mark `?`. – rgettman Mar 20 '14 at 20:53
  • @Cruncher [Why do Java octal escapes only go up to 255?](http://stackoverflow.com/a/9543611/897024) – kapex Mar 20 '14 at 20:56
  • 2
    @Cruncher Because C was originally written for a PDP-11. Does that explain everything? I didn't think so. – ajb Mar 20 '14 at 20:58
  • @Cruncher The [JLS Section 3.10.6](http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.6) explains octal literals in escape sequences. Why is beyond me. – rgettman Mar 20 '14 at 20:58
  • Or maybe it was PDP-8... see http://stackoverflow.com/questions/1835465/where-did-the-octal-hex-notations-come-from – ajb Mar 20 '14 at 21:05
  • 1
    It's octal because hexadecimal was considered only suitable for use by heathen IBMers up until about 1990. – Hot Licks Mar 20 '14 at 21:07
  • 1
    (Actually, way back in the olden days programmers counted on their fingers. However, since they lacked opposable thumbs they only counted to eight. But illiterate, uncultured programmers were obviously barefoot, and therefore could count to 16. So using octal was a sign of a cultured upbringing.) – Hot Licks Mar 20 '14 at 21:11
  • @HotLicks That's at least as good an explanation as the others I've seen. – ajb Mar 20 '14 at 21:30
  • @Cruncher - Any scheme (other than mine -- which I'm not going to reveal) is arbitrary. – Hot Licks Mar 20 '14 at 21:30
  • 1
    @ajb - Hey, I read it in Datamation, so it must be true! (Of course, it was in a letter to the editor I wrote to Datamation, but I don't see how that affects the truthiness of it.) – Hot Licks Mar 20 '14 at 21:32
  • thank you for those who said it's octal... i didnt know java automatically converts any number from \11 to \77 as octal, i would have thought it would just give me a compiler error for not delimiting my \ properly... i do get a compiler error if the numbers after my \ is 8 and 9 (ie: \88 and \99) – ayeen c Mar 21 '14 at 14:46
7

As the Java Language Specification states

OctalEscape:
    \ OctalDigit
    \ OctalDigit OctalDigit
    \ ZeroToThree OctalDigit OctalDigit

OctalDigit: one of
    0 1 2 3 4 5 6 7

ZeroToThree: one of
    0 1 2 3

the following String or character literal is an octal escape

\77

In octal, the value 77 is 63 which is the ? character.

Note that this has nothing to do with the StringTokenizer. It applies to your String literal

"C:\\temp\\\\7800000\7767546.pdf"

which, if you printed out, would print as

C:\temp\\7800000?67546.pdf

because that is the value stored.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
5

In this string literal:

String path = "C:\\temp\\\\7800000\7767546.pdf";

you forgot to escape the last \. What actually happens is this: According to JLS 3.10.6, \ may be followed by one, two, or three octal digits, and if it's followed by three octal digits, the first one must be 0 through 3. The compiler will take the longest substring that meets the rule. Since \776 doesn't follow the rules (the first digit is larger than 3), that means it interprets \77 as an escape sequence, where 77 is treated as an octal number, which equals 63 in decimal, which is the ASCII code for '?'.

ajb
  • 31,309
  • 3
  • 58
  • 84
4

"\77" is an octal escape sequence. It's decimal 63, or the '?' character.

erickson
  • 265,237
  • 58
  • 395
  • 493
0

thank you for those who said it's octal... that does makes sense...

i didnt know java automatically converts those kinds of numbers to octal (0\11 to 0\77), i would have thought it would just give me a compiler error for not delimiting my \ properly... i do get a compiler error if the numbers after my \ is 8 and 9 (ie: 0\88 and 0\99)

for those who said i should add another \, im not sure if you guys have seen my disclaimer that says: "forget about the fact that I should be delimiting the \ in the "\7767546" but I'm just curious what's with the \11 until \77 in java" but just the same thank you for the concern...

ayeen c
  • 71
  • 1
  • 5