I'm working on a query which I thought should be quite intuitive, but somehow I'm facing a bit of issues when implementing it. I guess what I'm trying to achieve is to match a string stored in MySQL DB without space and punctuation (other creative approaches are more than welcome). At the same time I would like the query to handle Unicode characters in diacritics insensitive fashion (so options like REGEXP are kinda out of luck). And the last condition is I'm on MySQL 5.5 with InnoDB engine, so full-text indexing is not supported (but I'm open to upgrade to 5.6/5.7 if it helps sorting this out).
Consider the scenario which the string Hello-World from John Doe is stored in DB. I would like to find it when given the search string HelloWorld or JohnDoe. To be more general, the string in DB can contain brackets, understores and any other punctuation (not limited to ASCII but can compromise for now), while the search string can be a combination of words with or without any separators in between. The closest I've gotten so far is to daisy chain the REPLACE function for a list of known punctuation, like below:
SELECT text FROM table WHERE REPLACE(REPLACE(text, '-', ''), ' ', '') LIKE '%JohnDoe%'
My questions are:
- Is there a better way instead of using the daisy chain above?
- If that's the only solution, how will the performance be impacted when I chain up hundred or more REPLACE functions?
Thanks in advance for your help.