Isn't the diacritical mark above "a" should be removed by the Regex?
"hǎo".gsub(/\p{Nonspacing_Mark}/, '')
=> "hǎo"
"hǎo".gsub(/\p{Mn}/, '')
=> "hǎo"
Update:
I kind of get it from how it works in Java.
Normalizer.normalize("hǎo", Form.NFD).replaceAll("\\p{Mn}+", "")
I need to normalizer it first to split the "ǎ" into "a" and the diacritical mark.