I use mysql database for my primary storage, and i have a myisam table to cache everything on it to search them using fulltext. When i want to use sphinx or elasticsearch to improve it, an important problem appears.
Lets say i have 10 tables with more than 20 columns. They each have 6-10 varchar columns in which i want to make keyword search. They also have some date and integer columns which i also want to search using ranges and equalization for example. Currently, i am caching all those varchars in only one myisam table with a fulltext column, and a few other columns to define which table it belongs to and record id. Supposing the myisam table is cache_table
and fulltext column is search_text
, my example search query is:
SELECT SQL_CALC_FOUND_ROWS t1.* FROM t1
INNER JOIN cache_table AS c ON(c.table='t1' AND c.record_id=t1.id)
WHERE MATCH(c.search_text) AGAINST (+something* IN BOOLEAN MODE)
AND t1.date1 < CURDATE() AND t1.integer1 > 3 AND ....
ORDER BY t1.date
LIMIT 0, 100
I can move this data to any search engine. But then it will be a problem to filter the data on mysql. I can use sphinxSE as mysql storage engine to use the same way of sql. But this time, the pagination will be the problem because sphinx has a configuration to control maximum number of records it can return. Lets say, it is 1000. So, there is a possibility for search engine to return only 1000 of 20.000 records which does not fit the other criterias in the query(e.g. t1.date1
and t1.integer
); whereas another 1000 records in the search engine db could fit to same criteria. In this case, the above query will return empty result set whereas there are records.
Since I have no possibility to cache all database to a search engine, how do you think it can be achieved ?
Note: Why i cannot move all data to search engine is that the database structure is highly dynamic. The users have the ability to create new modules which means new tables, and they need to search on it. Thats why it can be either ~5 tables to search or maybe ~60;