2

I'm stuck with a query on a InnoDB table in a MySQL database. I need to find orders based on a fulltext search on two text fields which contain order and customer details in json encoded text. Here is the table schema:

+--------------+------------+------+-----+---------+----------------+
| Field        | Type       | Null | Key | Default | Extra          |
+--------------+------------+------+-----+---------+----------------+
| id           | int(11)    | NO   | PRI | NULL    | auto_increment |
| user_id      | int(11)    | NO   | MUL | NULL    |                |
| comment      | text       | NO   |     | NULL    |                |
| modified     | datetime   | NO   |     | NULL    |                |
| created      | datetime   | NO   | MUL | NULL    |                |
| items        | mediumtext | NO   | MUL | NULL    |                |
| addressinfo  | text       | NO   |     | NULL    |                |
+--------------+------------+------+-----+---------+----------------+
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| orders |          0 | PRIMARY  |            1 | id          | A         |       69144 |     NULL | NULL   |      | BTREE      |         |               |
| orders |          1 | user_id  |            1 | user_id     | A         |       45060 |     NULL | NULL   |      | BTREE      |         |               |
| orders |          1 | created  |            1 | created     | A         |       69240 |     NULL | NULL   |      | BTREE      |         |               |
| orders |          1 | search   |            1 | items       | NULL      |       69240 |     NULL | NULL   |      | FULLTEXT   |         |               |
| orders |          1 | search   |            2 | addressinfo | NULL      |       69240 |     NULL | NULL   |      | FULLTEXT   |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

The table has around 150.000 rows. It has one fulltext index on the items and addressinfo column.

And here comes the query:

SELECT 
  id
FROM 
  orders 
WHERE 
  MATCH (items, addressinfo) AGAINST (
    '+simon* +white* ' IN BOOLEAN MODE
  ) 
ORDER BY 
  id DESC 
LIMIT 
20

This is the EXPLAIN result:

+----+-------------+--------+------------+----------+---------------+--------+---------+-------+------+----------+---------------------------------------------------+
| id | select_type | table  | partitions | type     | possible_keys | key    | key_len | ref   | rows | filtered | Extra                                             |
+----+-------------+--------+------------+----------+---------------+--------+---------+-------+------+----------+---------------------------------------------------+
|  1 | SIMPLE      | orders | NULL       | fulltext | search        | search | 0       | const |    1 |   100.00 | Using where; Ft_hints: no_ranking; Using filesort |
+----+-------------+--------+------------+----------+---------------+--------+---------+-------+------+----------+---------------------------------------------------+

On large resultsets the query takes around 30 seconds to process on a standard LAMP VM.

Without ordering by

ORDER BY id DESC
the query is processed much faster in around 0.6 seconds.

The only difference in the EXPLAIN result is that "Using filesort" is missing in the faster query. Measuring the query says that 98% of the processing time (27s) is used for "Creating Sort Index".

Is there any way to do the fulltext search on this table with ORDER BY in a reasonable processing time (less than a second)?

I already tried different approaches e.g. putting the order by column into the fulltext index (text_id as TEXT column) with no luck. The approach from here: How to make a FULLTEXT search with ORDER BY fast? is also not faster.

As the application runs on a shared host I'm very limited in optimizing MySQL ini values or Memory values.

Thanks a lot!

Simon
  • 91
  • 5
  • You have the usual option: normalize your data. Relational databases work best with structured data (the "s" in "sql" stands for "structured"). Also fulltext indexes thrive on rare words. Json-data is unstructured data. So while easy to store, it's harder to evaluate (try to list all products you sold last month). If you are e.g. looking for "mr. white", it is much faster to just look into columns you know contains (sur)names than to also find products ("white socks") or adresses ("white house"). As the list is both shorter and can be optimized for the `order by` too (without fulltext index). – Solarflare Aug 07 '17 at 16:50
  • If that is actually "just" a datadump (and you have the data elsewhere in a structured way, or do not need to evaluate it in any other way than to search in it), you could (maybe just additionally) use a different database/search engine like elasticsearch, solr or sphinx (this is an incomplete list without any particular order). They specialize in searching through big amounts of (unstructured) data, so it might be a better option than a relational database. – Solarflare Aug 07 '17 at 16:54
  • Thanks for your answer. It's indeed just a datadump, actually a copy of the complete order at the time it was made. I can search for articles and customer data using the corresponding tables but I hoped searching a single text column with fulltext index (which is very fast most of the time) would be easier and faster. I wasn't aware of the problem with fulltext and order by. – Simon Aug 07 '17 at 17:23
  • 1
    Well, it depends on how you are searching. A fulltext index has its purpose and can be a very useful tool. The disadvantage is that you cannot combine it with other indexes, so it has to return everything that contains the keywords anywhere, then sort it, then limit it to 20. This is very very effective if you are looking for rare words in the middle of a column (as any other method in mysql requires a full table scan), but gets less effective if everyone buys white socks. A full table scan (ordered by id) is much more efficient then, as it can just stop after it found 20 rows. – Solarflare Aug 07 '17 at 17:45
  • All right, thanks again. So it seems I have to find another solution. But good to understand the drawbacks of fulltext search in mysql. – Simon Aug 08 '17 at 06:42

1 Answers1

1

You might gain some time when using a delivered table. try it.

Query

SELECT
 orders.id
FROM (
  SELECT 
    id  
  FROM 
    orders 
  WHERE 
    MATCH (items, addressinfo) AGAINST (
      '+simon* +white* ' IN BOOLEAN MODE
    )  
) 
  AS 
    orders_match
INNER JOIN
 orders 
ON
 orders_match.id = orders.id

ORDER BY
 orders.id DESC

LIMIT 20
Raymond Nijland
  • 11,488
  • 2
  • 22
  • 34
  • Thanks! That indeed saved around 6 seconds but unfortunately thats still 22 seconds :/ I need to get under 2 or 3 seconds. Maybe that's not possible under that prerequisites?! – Simon Aug 07 '17 at 13:58