I'm working on cleaning up a database of "profiles" of entities (people, organizations, etc), and one such part of the profile is the name of the individual in their native script (e.g. Thai), encoded in UTF-8. In the previous data structure we didn't capture the character set of the name, so now we have more records with invalid values than possible to manually review.
What I need to do at this point is, via script, determine what language/script any given name is in. With a sample data set of:
Name: "แผ่นดินต้น"
Script: NULL
Name: "አብርሃም"
Script: NULL
I need to end up with
Name: "แผ่นดินต้น"
Script: Thai
Name: "አብርሃም"
Script: Amharic
I do not need to translate the names, just determine what script they're in. Is there an established technique for figuring this sort of thing out?