Private Use Areas within Unicode
You’ve not really explained what goal you are trying to achieve, but likely there is no need to invent either:
- a character set (a collection of numbers each assigned to a particular character)
- a character encoding (a way to represent instances of those numbers as bits and bytes).
Unicode defines over 144,000 characters, each assigned a number from a range of zero to just over a million. That leaves large gaps of numbers unassigned. Some of those empty sub-ranges are reserved for future use. But, of interest to you, some of those sub-ranges are set aside for “private use”, never ever to be assigned to a character by the Unicode Consortium. See Wikipedia.
You are free to assign any meaning you wish to any number within those “private use areas”. So that works as your character set.
As for your character encoding, using UTF-8 is almost always best. This is true for several reasons, as discussed here.
Java supports all of Unicode. So no extra programming needed to support your characters. Everything works the same whether encountering characters from inside or outside the private use areas.
If you want to involve other people in your endeavor, or want to share documents, then you should be aware that there is an unofficial registry of characters assigned to Private Use numbers. This unofficial registry is a volunteer effort, made outside of the Unicode Consortium. This registry is for characters that would never be accepted for inclusion in Unicode. This includes imaginary languages such as Klingon from Star Trek. When selecting code point numbers for your characters, you may want to avoid these unofficially registered code points.