1

I use mysql database for my primary storage, and i have a myisam table to cache everything on it to search them using fulltext. When i want to use sphinx or elasticsearch to improve it, an important problem appears.

Lets say i have 10 tables with more than 20 columns. They each have 6-10 varchar columns in which i want to make keyword search. They also have some date and integer columns which i also want to search using ranges and equalization for example. Currently, i am caching all those varchars in only one myisam table with a fulltext column, and a few other columns to define which table it belongs to and record id. Supposing the myisam table is cache_table and fulltext column is search_text, my example search query is:

SELECT SQL_CALC_FOUND_ROWS t1.* FROM t1
INNER JOIN cache_table AS c ON(c.table='t1' AND c.record_id=t1.id)
WHERE MATCH(c.search_text) AGAINST (+something* IN BOOLEAN MODE) 
      AND t1.date1 < CURDATE() AND t1.integer1 > 3 AND ....
ORDER BY t1.date
LIMIT 0, 100

I can move this data to any search engine. But then it will be a problem to filter the data on mysql. I can use sphinxSE as mysql storage engine to use the same way of sql. But this time, the pagination will be the problem because sphinx has a configuration to control maximum number of records it can return. Lets say, it is 1000. So, there is a possibility for search engine to return only 1000 of 20.000 records which does not fit the other criterias in the query(e.g. t1.date1 and t1.integer); whereas another 1000 records in the search engine db could fit to same criteria. In this case, the above query will return empty result set whereas there are records.

Since I have no possibility to cache all database to a search engine, how do you think it can be achieved ?

Note: Why i cannot move all data to search engine is that the database structure is highly dynamic. The users have the ability to create new modules which means new tables, and they need to search on it. Thats why it can be either ~5 tables to search or maybe ~60;

abeyaz
  • 3,034
  • 1
  • 16
  • 20
  • The following Q&A might provide some useful info: http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage – Val Dec 29 '15 at 04:15
  • I already read such comparisons and defitinitions of them. Neither explains the problem here, and i am not even sure there is a possible workaround for a solution to it. Maybe, what i want to do is totally impossible this way. – abeyaz Dec 29 '15 at 10:26
  • 1
    Not enough infor for a proper answer, but what you talk about could amost certainly be done with sphinx. Putting all the dates, and range filters as **Attributes**. Can use JSON attributes, if want *varied* attributes per document. – barryhunter Dec 30 '15 at 22:16
  • @barryhunter thanks for the info, i didnt know json attrs. That might help. However, my database is a bit complicated because of customer-based dynamic structure. Customers can expand it adding their own columns, so it wont be that easy to mimic the structure in sphinx at the same time. – abeyaz Dec 31 '15 at 16:37

0 Answers0