After reading the contacts from the user on the device I want to display them in a list grouped by sections. I do this extracting the first human readable letter from the name, removing diacritical marks and creating a section for each different letter. This works well until users type emoji into their contacts, for instance:
Those three items appear on my laptop's contact list as grouped in the #
section. However my algorithm creates two sections, and , which is not desirable. I don't want to put anything non ASCII into the #
group because users with non latin alphabets won't like that (japanese, russian, korean, etc) but I don't know all these languages so I don't know what should be done for them.
Is there a table I can use to know if the character should go into this numeric #
section or should create a human readable section letter? And does this work universally or is it locale related, with certain countries grouping letters in a different way than others due to cultural reasons?