1

I want to understand CLDR as it is enhanced in JDK-9.

JDK 9 CLDR - As per JDK docs

In JDK 9, the default locale data uses data derived from the Unicode Consortium's Common Locale Data Repository (CLDR). As a result, users may see differences in locale sensitive services behavior and/or translations.

Can some one help me understand this and let me know what will be its reprecusion in JDK8 Date API.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
T-Bag
  • 10,916
  • 3
  • 54
  • 118
  • 2
    Enter/copy “Unicode Consortium's Common Locale Data Repository (CLDR)” into Google and follow the first result link. – Holger Jul 12 '18 at 11:12
  • @Holger-- Nice answer, actually this is what I am looking for. – T-Bag Jul 12 '18 at 11:15
  • 1
    I am downvoting because the question seems to be poorly researched. – Ole V.V. Jul 12 '18 at 11:35
  • 1
    For a real-world example of the impact of using [Unicode CLDR](http://cldr.unicode.org/) (see [Wikipedia](https://en.wikipedia.org/wiki/Common_Locale_Data_Repository)), see the Question, [*Java's MessageFormat Not Localizing Portuguese Months in Dates in Lowercase*](https://stackoverflow.com/q/51275322/642706). The CLDR defines the rules that embody the cultural norms for localization issues such as how to capitalize the name of a month. – Basil Bourque Jul 12 '18 at 21:09

2 Answers2

7

The CLDR (Common Locale Data Repository) is a set of data collected by the Unicode Consortium that many libraries use to provide data related to internationalization.

Stuff that it contains are things like:

  • information on how dates/times are formatted in a given locale.
  • information on how sorting of text (collation) happens in a given locale
  • information on how numbers are represented in a given locale
  • names for currencies, units and geographic regions
  • ...

Note: a "locale" is basically "a language as spoken in a given region". It's a bit more involved than that, but that's a good high level language. "en-US" for example represents American English and "de-DE" is German as spoken in Germany.

The JDK has traditionally maintained its own set of data for that. That changed in Java 9 and later, with most implementations of Java now using the CLDR by default. See JEP 252: Use CLDR Locale Data by Default.

Having worked both with JDK data and CLDR data I can say that on average the CLDR data is much better, more actively maintained and (probably most importantly) it has a specified way on how to provide improvements or bug reports.

The practical difference of that is that some formatting might behave slightly differently than it did before, in most cases more correct, but possibly in unexpected ways. This will apply especially when using non-English languages (the effects of such a change on the English locale are rather small).

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • 1
    Good answer. One more plus for CLDR is that it has a rich collection of sub-culture variants lacking in the old Java-specific localization rules. – Basil Bourque Nov 14 '21 at 03:59
1

CLDR encapsulates the rules for sorting and formatting content for all the world (e.g. date and currency formats). This is a big data set that is closely tied to Unicode itself.

CLDR is designed to be a formal, stable set of these definitions.

Since the CLDR rules differ, in some cases for some locales, from those that were build into versions of Java 8 and before, they have provided that warning.

Dragonthoughts
  • 2,180
  • 8
  • 25
  • 28