3

Perl 5 has a module on CPAN named Text::Unidecode that transliterates Unicode into ASCII. So, for instance, if you hand it the string "“北亰 — it’s the best”" it hands back the string "\"Bei Jing -- it's the best\"". A quick search for Java libraries to do the same thing only turned up code that would strip Unicode characters or turn accented characters into non-accented characters.

Does anyone know of a Java library that produces similar output to Text::Unidecode?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
  • 1
    Just be aware that the implemented algorithm is so over-simplified, that I can't honestly imagine any reasonable use of this "transliteration" library. To transliterate text using non-latin characters to latin characters, you need at least to know the source and target languages, potentially which transliteration system to use and in some cases even implicit context knowledge, making an automated translation nearly impossible. – jarnbjo Jul 02 '13 at 15:51
  • 1
    To whoever voted to close this question for allegedly asking to “recommend a tool, library or favorite off-site resource”: This question does not incite opinionated debate. It asks whether a similar library exists (fact-based), not what the *best/favourite* library would be (opinion-based). This question should remain open so that alternative libraries can be provided as answers. – amon Jul 02 '13 at 16:28
  • @jarnbjo A message is being corrupted when transferring between two databases. The corruption only occurs to non-ASCII characters. While a fix for the corruption is being investigated, it is desirable for the messages to be readable. The language is nearly 100% English (maybe a little Spanish) and we are mostly dealing with problems with em dashes, curly quotes, and the like, but I wanted a more complete stopgap solution that didn't just strip out the offending characters. – Chas. Owens Jul 03 '13 at 16:26
  • http://stackoverflow.com/questions/2096667/convert-unicode-to-ascii-without-changing-the-string-length-in-java – Anonymous Dec 11 '13 at 06:55
  • There are, at the time of writing, [7 different "junidecode" libraries](https://mvnrepository.com/search?q=junidecode&sort=popular) and [5 different "unidecode" libraries](https://mvnrepository.com/search?q=unidecode) libraries that come up when I search mvnrepository (which in turn searches lots of different package repositories). Most or all of these are presumably ports of Perl's unidecode. I haven't dug into the 12 options enough to have a particular recommendation, but you certainly have many options available! – Mark Amery Jul 09 '22 at 14:27

2 Answers2

1

A quick Google says: http://junidecode.sourceforge.net/ - but looks like it hasn't been updated for a while.

kittylyst
  • 5,640
  • 2
  • 23
  • 36
  • 1
    Given that the Perl 5 version hasn't changed since 2001, I doubt it would need updates once it was working. – Chas. Owens Jul 02 '13 at 15:41
  • Not sure about that. Java's Unicode support has gone through quite a few revisions in the last few years - I doubt a library for 2010 is capable of supporting everything the modern platform is, and my understanding is that this is a non-trivial problem. – kittylyst Jul 02 '13 at 18:50
-1

There is another library for Java: unidecode (disclaimer: I’m the author of this library).

Use with Gradle:

compile 'cz.jirutka.unidecode:unidecode:1.0.1'

Use with Maven:

<dependency>
    <groupId>cz.jirutka.unidecode</groupId>
    <artifactId>unidecode</artifactId>
    <version>1.0.1</version>
</dependency>
Jakub Jirutka
  • 10,269
  • 4
  • 42
  • 35
  • Could you edit this to indicate your relationship to the library you're suggesting in the text of the answer? I see it's under your GitHub account and that you've contributed many commits to it. Per [the self-promotion policy](https://stackoverflow.com/help/promotion), *"you must disclose your affiliation in your post"*. – Mark Amery Jul 09 '22 at 14:19
  • Any reason to favour this particular Java port of unidecode over any of the others? – Mark Amery Jul 09 '22 at 14:20
  • I’ve added the disclaimer, although my affiliation is kinda obvious…; This answer is 7 years old, I really don’t remember. Do your own research and decide for yourself, based on your needs. – Jakub Jirutka Aug 22 '22 at 18:53