Although this seems like a trivial question, I am quite sure it is not :)
I need to validate names and surnames of people from all over the world. Imagine a huge list of miilions of names and surnames where I need to remove as well as possible any cruft I identify. How can I do that with a regular expression? If it were only English ones I think that this would cut it:
^[a-z -']+$
However, I need to support also these cases:
- other punctuation symbols as they might be used in different countries (no idea which, but maybe you do!)
- different Unicode letter sets (accented letter, greek, japanese, chinese, and so on)
- no numbers or symbols or unnecessary punctuation or runes, etc..
- titles, middle initials, suffixes are not part of this data
- names are already separated by surnames.
- we are prepared to force ultra rare names to be simplified (there's a person named '@' in existence, but it doesn't make sense to allow that character everywhere. Use pragmatism and good sense.)
- note that many countries have laws about names so there are standards to follow
Is there a standard way of validating these fields I can implement to make sure that our website users have a great experience and can actually use their name when registering in the list?
I would be looking for something similar to the many "email address" regexes that you can find on google.