I have a table of about 5M rows of data (articles). I have the below query for fulltext searching in the title of the articles in two different languages. The problem with it is that it take about 15 seconds to be executed. MySQL version: 5.6.29-log
Here is the query:
SELECT `id`, `title`, `title_fa` FROM
(SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`unique` AS `unique`, `p`.`date` AS `date` FROM `articles` `p` LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique` WHERE 1 AND MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE)
UNION
SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`unique` AS `unique`, `p`.`date` AS `date` FROM `articles` `p` LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique` WHERE 1 AND MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE)) AS `subQuery`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10;
This is the table structure:
CREATE TABLE `articles` (
`id` int(10) unsigned NOT NULL,
`title` text COLLATE utf8_persian_ci NOT NULL,
`title_fa` text COLLATE utf8_persian_ci NOT NULL,
`description` text COLLATE utf8_persian_ci NOT NULL,
`description_fa` text COLLATE utf8_persian_ci NOT NULL,
`date` date NOT NULL,
`unique` tinytext COLLATE utf8_persian_ci NOT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;
ALTER TABLE `articles`
ADD PRIMARY KEY (`id`),
ADD KEY `unique` (`unique`(128)),
ADD FULLTEXT KEY `TtlDesc` (`title`,`description`);
ADD FULLTEXT KEY `Title` (`title`);
ADD FULLTEXT KEY `faTtlDesc` (`title_fa`,`description_fa`);
ADD FULLTEXT KEY `faTitle` (`title_fa`);
MODIFY `id` int(10) unsigned NOT NULL AUTO_INCREMENT;
First improvement step:
By searching is SO I came across this post:
Combining UNION and LIMIT operations in MySQL query
Using the proposed method I changed my query as follow:
SELECT `id`, `title`, `title_fa` FROM
(SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE) LIMIT 0,100
UNION
SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique` WHERE MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE) LIMIT 0,100) AS `subQuery`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10
The performance was astonishing and the query took like 0.04 seconds to execute. The problem was with sorting that I liked to have more recent articles listed first but this query is not able to do so. Also I'm not sure how the next set of results (i.e. the next 10 results - second page of the results) could be retrieved and shown.
Second improvement step:
Searching more on SO I came across this one:
SQL Query - Using Order By in UNION
And my query looked like below:
SELECT `id`, `title`, `title_fa`, `unique`, `date` FROM
(SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE) ORDER BY `p`.`date` DESC LIMIT 0,20) AS `subQueryE`
UNION ALL
SELECT `id`, `title`, `title_fa`, `unique`, `date` FROM
(SELECT `f`.`id` AS `id`, `f`.`title` AS `title`, `f`.`title_fa` AS `title_fa`, `f`.`date` AS `date`, `f`.`unique` AS `unique` FROM `articles` `f` LEFT JOIN `authors` `a` ON `f`.`unique` = `a`.`unique` WHERE MATCH (`f`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE) ORDER BY `f`.`date` DESC LIMIT 0,20) AS `subQueryF`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10
The performance was better but not satisfying as it took about 7 seconds. It brought up another problem that even having GROUP BY unique
duplicate rows was still present in the results.
Third step:
I did another test by executing the following query hoping to get better results:
SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE) OR MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE) GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10
But the execution time was awful and reached more than 100 seconds.
Any help is more than welcome and thanks in advance.