0

I first started searching for a suitable REGEX solution but after reading this I decided to ask a new question.

I have a bug tracking webapp written in PHP/MySQL, it uses TinyMCE as a rich text editor for the bigger text fields like e.g. detailed description. Recently we have allowed pasting of images in the rich text fields. TinyMCE takes care of encoding the image as a base64 string and putting it inside an <img> tag.

Now I've noticed the search results are not accurate anymore, because I do a simple Body LIKE '%searchstring%' query. When searching for short strings there's a quite high chance one of the base64 encoded image strings will contain the string you're searching for. So from the users perspective he gets a search result which in no way contains the string he searched for.

So the question is how should I tackle this problem? MySQL does not have an HTML parser which is often suggested as a REGEX alternative. Yet currently the body fields of my webapp are stored in a longtext SQL field. The body fields contains the complete HTML blob which we got from TinyMCE.

Is there a way to fix my search queries? Or do I have to store the data in a different way?

I already tried this solution, it works perfectly but it increases the query time (on slow development database) from 10s to 214s!! So it's not a realistic option.

Community
  • 1
  • 1
Alex
  • 928
  • 1
  • 16
  • 30
  • 3
    Create a second column, and populate it with the text extracted from the HTML on save. Then search based on that. – Lucas Trzesniewski Sep 29 '15 at 09:21
  • Are these whole words that you are searching for? If so then you could add a space after (and maybe before) the sql statement Body LIKE '% searchstring %' –  Sep 29 '15 at 09:31
  • @jeff That won't work, yes they are whole words but they might be the first word in the body, or the last word in a sentence, ending with a period instead of a space. Lucas Trzesniewski's answer might be the solution. – Alex Sep 29 '15 at 09:51

0 Answers0