I was granted with the beautiful task ;-) to design some tables in a MySQL Database which should hold human names.
Criteria:
- I have only the full names. (There is no separation for e.g. prename, surname and so on)
The storage should be diacritic sensitive. (The following names stand for different persons)
- "Voss" and "Voß".
- "Joel" and "Joël".
- "franc" and "Franc" and "Fránc".
A search should return all similar names to the search string: E.g: Search for "franc" should return ["franc", "Franc", "Fránc"] and so on... (It would be awesome if the search would return not only the diacritice insensitive matches but perhaps similar sounding names or names that match in parts to the search string, too...)
I thougt of using the COLLATION utf8_bin
for the column (declared as unique
) in which I will store the names. This would satisfy point 2. But this will hurt point three. Declaring the column name as unique
with collation utf8_unicode_ci
satisfys point 3. but it hurts point two.
So my question is: Is there a way to solve this task and respecting all criteria? And since I don't want to reinvent the wheel: Is there an elegant way to handle human names (and their searches) in databases? (Sadly, I do not have the possibility of splitting the names into prename, surnames and optional middlenames...)
Edit:
The amount of names is arount a million (~1.000.000) entrys. And if it matters: I am using python as scripting language to populate the database and query the data later on.