6

I'm writing a library in Java which creates the URL from a list of filenames in this way:

final String domain = "http://www.example.com/";

String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};

System.out.println(domain+normalize(filenames[0]);
//Prints  "http://www.example.com/Normal_text"
System.out.println(domain+normalize(filenames[1]);
//Prints  "http://www.example.com/Ich_weib_nicht"
System.out.println(domain+normalize(filenames[2]);
//Prints  "http://www.example.com/L_ho_inserito_tra_i_principi"

Exists somewhere a Java library that exposes the method normalize that I'm using in the code above?

Literature:

Community
  • 1
  • 1
mat_boy
  • 12,998
  • 22
  • 72
  • 116
  • 1
    Take a look at this: http://stackoverflow.com/questions/21489289/what-is-the-equivalent-of-stringbyfoldingwithoptionslocale-in-java/21489947#21489947 – StoopidDonut Feb 10 '14 at 13:32
  • 1
    @PopoFibo Yes, it works! I never seen the `Normalizer` class in Java! Thanks a lot! Can you post an answer with a short example? – mat_boy Feb 10 '14 at 13:37

2 Answers2

6

Taking the content from my previous answer here, you can use java.text.Normalizer which comes close to normalizing Strings in Java. An example of normalization would be;

Accent removal:

String accented = "árvíztűrő tükörfúrógép";
String normalized = Normalizer.normalize(accented,  Normalizer.Form.NFD);
normalized = normalized.replaceAll("[^\\p{ASCII}]", "");

System.out.println(normalized);

Gives;

arvizturo tukorfurogep
Community
  • 1
  • 1
StoopidDonut
  • 8,547
  • 2
  • 33
  • 51
3

Assuming you mean you want to encode the strings to make them safe for the url. In which case use URLEncoder:

final String domain = "http://www.example.com/";

String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};

System.out.println(domain + URLEncoder.encode(filenames[0], "UTF-8"));
System.out.println(domain + URLEncoder.encode(filenames[1], "UTF-8"));
System.out.println(domain + URLEncoder.encode(filenames[2], "UTF-8"));
mikea
  • 6,537
  • 19
  • 36
  • No, because I can't use "special" chars like % and so on which comes from the `URLEncoder.encode()` method. I'm creating URLs which must be result valid byr a special XML validator. It requires no whitespaces, no special chars, and so on – mat_boy Feb 10 '14 at 13:40
  • So they aren't URLs then – mikea Feb 10 '14 at 13:41
  • No no, they are! The XML contains a list of elements, each element has an `rdf:about` property which has an URL as value – mat_boy Feb 10 '14 at 13:42
  • In which case I would use StringEscapeUtils.escapeXML() from the apache commons lang library but I don't see what that has to do with URLs – mikea Feb 10 '14 at 14:00
  • That method is not ok for the validator. E.g., `StringEscapeUtils.escapeXml("l'avevo all'università")` is escaped as `l'avevo all'università`: the accented "a" is still there! Moreover, the text is not human-readable. – mat_boy Feb 10 '14 at 14:06