Encoding of forward slash ('\u002f') in Java strings

Question

I have a problem with encoding of the forward slash character in Java. I have this program to illustrate what is happening -

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.charset.Charset;

import org.apache.commons.io.IOUtils;

public class SlashTester {

    public static void main(String[] args) throws FileNotFoundException, IOException {
        String input = "http:\u002f\u002fgoogle.com";
        System.out.println(input); // Prints "http://google.com"

        String input2 = IOUtils.toString(new FileInputStream("hello.txt"), Charset.forName("UTF-8"));
        System.out.println(input2); //Prints "http:\u002f\u002fgoogle.com"
    }
}

The program reads from the file "hello.txt". The content of the file is just -

http:\u002f\u002fgoogle.com

Note that this is the same as the string 'input'.

Can anyone explain to me why there is a difference in the outputs?

score 1 · Accepted Answer · answered Aug 25 '15 at 15:34

1

The Java compiler understands the \uXXXX syntax and translates these characters. If you want to perform this translation, you need to do this yourself.

Note: this doesn't even have to be in a String, you can do things like

\u0053\u0079\u0073\u0074\u0065\u006d\u002e\u006f\u0075\u0074\u002e\u0070\u0072\u0069\u006e\u0074\u006c\u006e\u0028\u0022\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064\u0022\u0029\u003b

is the same as

System.out.println("Hello World");

because the compiler translates these characters before checking the code.

answered Aug 25 '15 at 15:34

Peter Lawrey

525,659
79
751
1,130

Thanks. This explains it! I de-compiled the .class file and saw that the string is just `"http://google.com"`. – MediumOne Aug 25 '15 at 15:41
@MediumOne a crude way to fix this characters is to do `input2 = input2.replaceAll("\\\\u002f", "/");` – Peter Lawrey Aug 25 '15 at 15:43

score 1 · Answer 2 · answered Aug 25 '15 at 15:39

You have to put your String with double backlash to avoid this behaviour, as this:

String input = "http:\\u002f\\u002fgoogle.com";

It will print you http:\u002f\u002fgoogle.com. It is because in a String special characters have to been scaped with a backlash \ before of them.

For more info you can see this: javadocs

I expect it will be helpful for you!

score 0 · Answer 3 · answered Jun 09 '21 at 22:04

for anyone who want to convert and escaped unicode url or string to unescaped one like i did

escaped characters:

'http\\u00253A\\u00252F\\u00252Fexample.com';

unescaped characters:

'http://example.com'

use this trick can save you hours (from radicant's answer ) :

decodeURIComponent(JSON.parse('"http\\u00253A\\u00252F\\u00252Fexample.com"'));
> 'http://example.com'

Encoding of forward slash ('\u002f') in Java strings

3 Answers3