I'm running into a complicated situation here, and I'm hoping for a push in the right direction.
I need to allow Basic Latin searches to bring back results with diacritics. This is further complicated by the fact that the data is stored with HTML instead of pure ASCII. I have been making some progress, but have come across two problems.
First: I'm able to do a partial conversion of the data into something marginally useful, using something like this:
$string = 'Véra';
$converted = html_entity_decode($string, ENT_COMPAT, 'UTF-8');
setlocale(LC_ALL, 'en_US.UTF8');
$translit = iconv('UTF-8', 'ASCII//TRANSLIT', $converted);
echo $translit;
This brings back this result: V'era
This is a start but what I really need is Vera
. I can do a preg_replace on resulting string, but is there a way of just bringing it back without the hyphen? This is only one example; there are a lot more diacritics in the database (e.g. ñ
and more). I feel like this has been addressed before (e.g. iconv returns strange results), but there don't appear to be any solutions listed.
Bigger Problem: I need to convert a string such as Vera
and be able to bring back results with Véra
. as well as results of Vera
. However I believe I need to get problem 1 solved first before I can get to this point.
I'm thinking something like if ($translit) { return $string} but I'm a bit unsure of how to handle this.
All help appreciated.
Edit: I'm thinking this might be done easier directly in the database, however I'm running into issues with DQL. I know that there are ways with doing it in SQL with a stored procedure, but with limited access to the database, I'm open any suggestions for dealing with this in Doctrine
Okay, so maybe I'm making this too difficult
All I need is a way of finding entries that have been HTML encoded in the database without having to search with either the specific encoding but also without the diacritic itself. If I search for Jose
, it should bring up anything in the database labeled as José