1

I have a table of about 5M rows of data (articles). I have the below query for fulltext searching in the title of the articles in two different languages. The problem with it is that it take about 15 seconds to be executed. MySQL version: 5.6.29-log

Here is the query:

SELECT `id`, `title`, `title_fa` FROM
    (SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`unique` AS `unique`, `p`.`date` AS `date` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE 1 AND MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE)
    UNION
    SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`unique` AS `unique`, `p`.`date` AS `date` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE 1 AND MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE)) AS `subQuery`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10;

This is the table structure:

CREATE TABLE `articles` (
  `id` int(10) unsigned NOT NULL,
  `title` text COLLATE utf8_persian_ci NOT NULL,
  `title_fa` text COLLATE utf8_persian_ci NOT NULL,
  `description` text COLLATE utf8_persian_ci NOT NULL,
  `description_fa` text COLLATE utf8_persian_ci NOT NULL,
  `date` date NOT NULL,
  `unique` tinytext COLLATE utf8_persian_ci NOT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;

ALTER TABLE `articles`
  ADD PRIMARY KEY (`id`),
  ADD KEY `unique` (`unique`(128)),
  ADD FULLTEXT KEY `TtlDesc` (`title`,`description`);
  ADD FULLTEXT KEY `Title` (`title`);
  ADD FULLTEXT KEY `faTtlDesc` (`title_fa`,`description_fa`);
  ADD FULLTEXT KEY `faTitle` (`title_fa`);
  MODIFY `id` int(10) unsigned NOT NULL AUTO_INCREMENT;

First improvement step:

By searching is SO I came across this post:

Combining UNION and LIMIT operations in MySQL query

Using the proposed method I changed my query as follow:

SELECT `id`, `title`, `title_fa` FROM
    (SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE) LIMIT 0,100
    UNION
    SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE) LIMIT 0,100) AS `subQuery`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10

The performance was astonishing and the query took like 0.04 seconds to execute. The problem was with sorting that I liked to have more recent articles listed first but this query is not able to do so. Also I'm not sure how the next set of results (i.e. the next 10 results - second page of the results) could be retrieved and shown.

Second improvement step:

Searching more on SO I came across this one:

SQL Query - Using Order By in UNION

And my query looked like below:

SELECT `id`, `title`, `title_fa`, `unique`, `date` FROM
    (SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE)  ORDER BY `p`.`date` DESC LIMIT 0,20) AS `subQueryE`
    UNION ALL
SELECT `id`, `title`, `title_fa`, `unique`, `date` FROM
    (SELECT `f`.`id` AS `id`, `f`.`title` AS `title`, `f`.`title_fa` AS `title_fa`, `f`.`date` AS `date`, `f`.`unique` AS `unique` FROM `articles` `f` LEFT JOIN `authors` `a` ON  `f`.`unique` =  `a`.`unique` WHERE MATCH (`f`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE)  ORDER BY `f`.`date` DESC LIMIT 0,20) AS `subQueryF`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10

The performance was better but not satisfying as it took about 7 seconds. It brought up another problem that even having GROUP BY unique duplicate rows was still present in the results.

Third step:

I did another test by executing the following query hoping to get better results:

SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE) OR MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE) GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10

But the execution time was awful and reached more than 100 seconds.

Any help is more than welcome and thanks in advance.

Community
  • 1
  • 1
SAVAFA
  • 818
  • 8
  • 23
  • Can you check how long the following query needs? `SELECT p.id AS id, p.title AS title, p.title_fa AS title_fa, p.unique AS unique, p.date FROM articles p LEFT JOIN authors a ON p.unique = a.unique WHERE 1 AND MATCH (p.title) AGAINST ('"heat"' IN BOOLEAN MODE) order by p.id` (it's ordered by id, not date) and the same for the arabic version? Do you need the `authors` table? Do you have an index for `unique` on your `authors` table? – Solarflare Jul 21 '16 at 09:16
  • It took somewhere between 5 to 9 seconds in my tests. i didn't notice any significant differences between the two orderings and languages. Yes the author table is needed and it has index defined for `unique`. – SAVAFA Jul 21 '16 at 16:54
  • Then I suspect you won't get any faster than 5 to 9 seconds with a combined fulltext search either (combined column and just one match against), if you want to keep the ordering by date. MySQL is somewhat limited in the way it can combine fulltext indexes and other indexes, and I guess you reached a limit here, 7 seconds is usually unacceptable for a query. You might want to check out specialized search databases, like Solr or Elasticsearch for that. – Solarflare Jul 21 '16 at 18:09
  • Oh! Really? That doesn't sound good. – SAVAFA Jul 21 '16 at 18:51
  • Please document your version `select @@version` ... also I tweaked the tags. Some were meaningless, FTS was missing – Drew Jul 22 '16 at 12:19
  • The version is: 5.6.29-log – SAVAFA Jul 22 '16 at 13:48
  • What is the speed like if you put a `+` before the 'word'? Furthermore, please time each subquery separately; I want to see if only one is the slow part. – Rick James Aug 03 '16 at 05:23
  • @RickJames I timed queries individually. I couldn't get sure about the result I'll share because phpMyAdmin was giving me different execution time, but it seemed that the subquery with English search word is about 10 times faster than the one with Farsi/Persian search word. – SAVAFA Aug 17 '16 at 04:34
  • 10x - Interesting. And surprising. Could you time something else: `SELECT COUNT(*) FROM ... WHERE MATCH ...` -- that is nothing but the MATCH being timed. Also want to know if there are 10x as many Farsi matches. – Rick James Aug 17 '16 at 19:49
  • Here are my timings: 1. `SELECT count(*) FROM documents WHERE MATCH (title) AGAINST ('fast' IN BOOLEAN MODE)` the result was 12274 and took 0.54230975 seconds 2. `SELECT count(*) FROM documents WHERE MATCH (title_fa) AGAINST ('fast' IN BOOLEAN MODE)` the result was 325 and took 0.04076075 seconds 3. `SELECT count(*) FROM documents WHERE MATCH (title_fa) AGAINST ('سریع' IN BOOLEAN MODE)` the result was 31327 and took 4.26924525 seconds – SAVAFA Aug 17 '16 at 23:52
  • Notice how the time is roughly proportional to the number of rows returned. This puts you at the mercy of what the user chooses to search for. – Rick James Oct 08 '16 at 21:09

0 Answers0