What you want to do is called UTF-8 Normalization, I believe.
This post explains some of the foundations. Try this:
php > $mystring = "từ khóa a,từ khóa b, từ khóa c";
php > $mykeyword = "tu khoa b";
php > var_dump(transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove', $mystring));
string(30) "tu khoa a,tu khoa b, tu khoa c"
php >
Now, you can use the normal string manipulation functions to see if $mykeyword
is contained within $mystring
. Note that characters which don't have ASCII translations will be removed.
Note that for this to work, you need the PHP intl
module installed (often a package called php5-intl
). See here.
You can also use the Normalizer and preg_replace()
to strip accents:
php > var_dump(preg_replace('/\p{Mn}/u', '', Normalizer::normalize($mystring, Normalizer::FORM_KD)));
string(30) "tu khoa a,tu khoa b, tu khoa c"
php >
Yet another way is to use iconv()
:
php > var_dump(preg_replace('/[^a-zA-Z0-9 -]+/', '', iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $mystring)));
string(25) "t khoa at khoa b t khoa c"
However, as you can see, the ừ
didn't properly translate.