0

I have a problem with encoding of the forward slash character in Java. I have this program to illustrate what is happening -

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.charset.Charset;

import org.apache.commons.io.IOUtils;

public class SlashTester {

    public static void main(String[] args) throws FileNotFoundException, IOException {
        String input = "http:\u002f\u002fgoogle.com";
        System.out.println(input); // Prints "http://google.com"

        String input2 = IOUtils.toString(new FileInputStream("hello.txt"), Charset.forName("UTF-8"));
        System.out.println(input2); //Prints "http:\u002f\u002fgoogle.com"
    }
}

The program reads from the file "hello.txt". The content of the file is just -

http:\u002f\u002fgoogle.com

Note that this is the same as the string 'input'.

Can anyone explain to me why there is a difference in the outputs?

MediumOne
  • 804
  • 3
  • 11
  • 28

3 Answers3

1

The Java compiler understands the \uXXXX syntax and translates these characters. If you want to perform this translation, you need to do this yourself.

Note: this doesn't even have to be in a String, you can do things like

\u0053\u0079\u0073\u0074\u0065\u006d\u002e\u006f\u0075\u0074\u002e\u0070\u0072\u0069\u006e\u0074\u006c\u006e\u0028\u0022\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064\u0022\u0029\u003b

is the same as

System.out.println("Hello World");

because the compiler translates these characters before checking the code.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
1

You have to put your String with double backlash to avoid this behaviour, as this:

String input = "http:\\u002f\\u002fgoogle.com";

It will print you http:\u002f\u002fgoogle.com. It is because in a String special characters have to been scaped with a backlash \ before of them.

For more info you can see this: javadocs

I expect it will be helpful for you!

Francisco Romero
  • 12,787
  • 22
  • 92
  • 167
0

for anyone who want to convert and escaped unicode url or string to unescaped one like i did

escaped characters:

'http\\u00253A\\u00252F\\u00252Fexample.com';

unescaped characters:

'http://example.com'

use this trick can save you hours (from radicant's answer ) :

decodeURIComponent(JSON.parse('"http\\u00253A\\u00252F\\u00252Fexample.com"'));
> 'http://example.com'
Ahmed Osama
  • 672
  • 1
  • 8
  • 15