How to get all national characters for selected Locale?

Question

In my app I need to generate passwords based on all available national characters, like:

private String generatePassword(String charSet, int passwordLength) {
    char[] symbols=charSet.toCharArray();
    StringBuilder sbPassword=new StringBuilder();
    Random wheel = new Random();

    for (int i = 0; i < passwordLength; i++) {
       int random = wheel.nextInt(symbols.length);
       sbPassword.append(symbols[random]);
    }
    return sbPassword.toString();
}

For Latin we have smth like:

charSet="AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz";

How to get similar String containing all national characters (alphabet) let's say for Thai, Arab or Hebrew?

I mean, all we know that Unicode contains all national characters available for any Locale, so there has to be a way to get them, otherwise I'd be forced to hardcode national alphabets - which is ugly... (in my case my app supports more than 10 locales)

Is there a particular set of "national characters"? I mean, English definitely uses A-Za-z normally, but also occasionally uses accented letters and ligatures, e.g. in café or pædiatrics [if you're really pretentious]. Is there really a limit on the letters used in a particular locale? — Andy Turner, May 04 '20 at 20:11
@M.S. In fact I think [`Character.UnicodeScript`](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.UnicodeScript.html) is what is wanted. — David Conrad, May 04 '20 at 21:30
@DavidConrad can't really understand how to use this `Character.UnicodeScript`. Trying to figure the way to use. Can you give any hint? — Barmaley, May 05 '20 at 15:30
Unfortunately there's no way to do it without iterating over the code points. I've added an answer that should be helpful, I hope. Let me know. — David Conrad, May 05 '20 at 17:04
Something similar was asked in [this question](https://stackoverflow.com/questions/17575840/better-way-to-generate-array-of-all-letters-in-the-alphabet) - and although there are some creative solutions, they involve some level of hard-coding (even if it's only to select the first and last character in a range). The problem with something like `Character.UnicodeScript` is that it probably gives you many more letters than you want for your specific needs. — andrewJames, May 05 '20 at 21:28

score 2 · Accepted Answer · answered May 05 '20 at 17:03

Since you're using char[], you aren't going to be able to represent all Unicode code points in all scripts, since some of them will be outside the Basic Multilingual Plane and will not fit in a single char. Unfortunately, there is no easy way to get all the code points for a script without looping through them, like so:

char[] charsForScript(Character.UnicodeScript script) {) {
  StringBuilder sb = new StringBuilder();
  for (int cp = 0; cp < Character.MAX_VALUE; ++cp) {
    if (Character.isValidCodePoint(cp) && script == Character.UnicodeScript.of(cp)) {
      sb.appendCodePoint(cp);
    }
  }
  return sb.toString().toCharArray();
}

This will return all the chars for a given script e.g., LATIN, GREEK, etc.

To get all code points, even outside the BMP, you could use:

int[] charsForScript(Character.UnicodeScript script) {) {
  List<Integer> ints = new ArrayList<>();
  for (int cp = 0; cp < Character.MAX_CODE_POINT; ++cp) {
    if (Character.isValidCodePoint(cp) && script == Character.UnicodeScript.of(cp)) {
      ints.add(cp);
    }
  }
  return ints.stream().mapToInt(i -> i).toArray();
}

How to get all national characters for selected Locale?

1 Answers1

Linked