I first started searching for a suitable REGEX solution but after reading this I decided to ask a new question.
I have a bug tracking webapp written in PHP/MySQL, it uses TinyMCE as a rich text editor for the bigger text fields like e.g. detailed description. Recently we have allowed pasting of images in the rich text fields. TinyMCE takes care of encoding the image as a base64 string and putting it inside an <img>
tag.
Now I've noticed the search results are not accurate anymore, because I do a simple Body LIKE '%searchstring%'
query. When searching for short strings there's a quite high chance one of the base64 encoded image strings will contain the string you're searching for. So from the users perspective he gets a search result which in no way contains the string he searched for.
So the question is how should I tackle this problem? MySQL does not have an HTML parser which is often suggested as a REGEX alternative. Yet currently the body fields of my webapp are stored in a longtext
SQL field. The body fields contains the complete HTML blob which we got from TinyMCE.
Is there a way to fix my search queries? Or do I have to store the data in a different way?
I already tried this solution, it works perfectly but it increases the query time (on slow development database) from 10s to 214s!! So it's not a realistic option.