2

I need to create slug strings(Human-readable URL slugs from any string) for English and non English characters.. for example Chinese, Japanese, Cyrillic and any other.

So, each string(for all languages) must be translated in English characters a-z, 0-9, for example java-slugify-string-for-non-english-characters

How can I achieve this in Java ?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
alexanoid
  • 24,051
  • 54
  • 210
  • 410
  • do you mean that the strings should be ascii? – mk. Mar 15 '15 at 20:46
  • I mean that each string(for all languages) must be translated in English characters a-z, 0-9 only, for example java-slugify-string-for-non-english-characters – alexanoid Mar 15 '15 at 20:50
  • So, for "ログイン" which I got from the google.co.jp home page, what would you expect it to "slugify" to? – Kenster Mar 15 '15 at 21:30

2 Answers2

3

You can use Slugify which is written in Java: https://github.com/slugify/slugify

Dermot Blair
  • 1,600
  • 10
  • 10
3

Convert each character into its integer representation, and concatenate:

    String foo = "中国";
    StringBuilder result = new StringBuilder();
    for (int i=0; i<foo.length(); i++) {
        result.append("\\").append((int)foo.charAt(i));
    }
    System.out.println(result);

Produces:

"\20013\22269"

...which is pretty easy to split and convert back to a string. You can also pad the numbers, convert them to hex, and add exclusions so that ASCII/English characters aren't converted, if you'd like. You could also have a look at other, more stardard ways of doing this sort of encoding.

Community
  • 1
  • 1
mk.
  • 11,360
  • 6
  • 40
  • 54