0

I keep the categories of the products as [id][id] in the c_id column.

However, when I want to bring products belonging to the category, the result is very slow.

Is this due to the "[" and "]" characters in the search value?

If not, I would appreciate if you suggest alternative queries.


SELECT  p.*
FROM    (
        SELECT  id
        FROM    products
        where c_id LIKE '%[111][2]%' 
        and status='1'
        ORDER BY id DESC
        LIMIT 3
        ) pi
JOIN products p ON p.id = pi.id

where status='1' is fast but listing by category is very slow

RiggsFolly
  • 93,638
  • 21
  • 103
  • 149
CoinAnalyzer
  • 56
  • 1
  • 6
  • Is this supposed to be some creative form of a separated list? Like `'[111][2]'` means a product belongs to category `111` and `2`? If yes, yes, that's a major flaw and isn't usually going to perform well. Normalize your schema. See ["Is storing a delimited list in a database column really that bad?"](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) (Spoiler: Yes, it is.). – sticky bit Jan 05 '22 at 20:46
  • 1
    And you joining the exact same table to a derived table just filtering and limiting that table seems weird, unless I overlook something. Just a `WHERE` in one query directly from the table should have the same effect and be way more readable and may even perform better (depending on how the optimizer actually rewrites your current query; it may also be possible they both yield the same plan) – sticky bit Jan 05 '22 at 20:51
  • @stickybit Yes 111 and 2 different categories. One is the main category and the other is its subcategory. Also, the reason I use join is because the order by function is slow. Because there are more than 500,000 records in the table. – CoinAnalyzer Jan 05 '22 at 20:59
  • "One is the main category and the other is its subcategory." -- Oh even worse. So when you relocate a category to another parent category, you'd have to update all products in and below the category as well. Or else your data is inconsistent. That should also be normalized. – sticky bit Jan 05 '22 at 21:06
  • If the column contents are actually `[111][2]`, then you shouldn't be using `LIKE` at all. Use of a wildcard as the first character means that every row in the table needs to be evaluated to see if it contains the part after the wildcard. Also, you could vastly improve the performance of your queries if you separate the values into two separate columns (`maincategory` and `subcategory`, for example), so that the columns can be properly indexed. – Ken White Jan 05 '22 at 21:06
  • "the reason I use join is because the order by function is slow." -- Hmm, OK. TBH, it surprises me a little why sorting in the inner query should be any cheaper, but if your plans suggest that... – sticky bit Jan 05 '22 at 21:08
  • @stickybit Yes, when there is an update in the categories, I need to update the products in that category. I think it has become necessary to keep the products belonging to the category in another table. But there is a problem, if a product belongs to 2 categories, it means there will be 2 records. 1 million records for 500,000 products ? Sample; product_categories p_id (int) c_id (int) – CoinAnalyzer Jan 05 '22 at 21:10
  • Yes, a linking table is the way to go. The amount of records seldomly is a problem in a modern relational DBMS when the data is appropriately indexed. Denormalization *can* be a last resort in dealing with performance optimization but also can easily have the opposite effect plus certainly having other negative effects. Don't prematurely "optimize" through denormalization. – sticky bit Jan 05 '22 at 21:14
  • @stickybit In summary, what do you suggest? Shall I create the table in the example and keep it? – CoinAnalyzer Jan 05 '22 at 21:16
  • Yes, first normalize the schema. Then adapt your queries. Then put proper indexes and see how it goes. If you encounter any issue thereafter, post a new question with a proper [example] and plan to get further help from thereon. – sticky bit Jan 05 '22 at 21:18
  • Do not use a single string column in the products table for your categories/subcategories. Each [category][subcategory] pair should be a separate row of at least 4 columns in a separate table. The minimum 4 columns should be a unique key for this table, a foreign key to the products id, the category and the subcategory. far more flexible to maintain your data this way and far easier to query. – Paul Maxwell Jan 06 '22 at 00:58
  • Run `SHOW CREATE TABLE products`, show sample data and run `SELECT version();`. [Update your question](https://stackoverflow.com/posts/70599246/edit) with those results. `LIKE '%[111][2]%' ` usually performs badly compared to `LIKE '[111][2]%' ` and the latter can utilize index on the column (if any) while the former won't. – FanoFN Jan 06 '22 at 01:17
  • @stickybit - I explain the advantage of the Join _for this case_ in my Answer. – Rick James Jan 09 '22 at 07:19
  • @RickJames: Hmm, that doesn't really explain anything for, like *why* or *when*, at least to me. The only thing I can vaguely imagine is that there is an index that could be used for a efficient execution and a quirk in the optimizer that leads to using that index in the subquery but not in an almost equivalent (apart from the projection) stand alone query. That's certainly possible but not universal. – sticky bit Jan 09 '22 at 14:51
  • @stickybit - It's not the index, it is being able to whittle down the number of rows to a small number (3). Then the outer query only needs to fetch `LONGTEXT` values. Without the derived table, the Optimizer was probably loading lots of Longtexts before getting to the filtering. – Rick James Jan 09 '22 at 22:05
  • @RickJames: You mean MySQL isn't smart enough not to load certain external (as not in the main table data, where there are only pointers to it) columns when it scans the table for the result in one (outer) `SELECT` instead of loading only the required data of the external columns afterwards (if needed)? But it suddenly can, when it joins the table to a derived table? Maybe because it then can use the index on the PK and leaves out all columns at first? Hmm, I don't know the internals and shortcomings of the implementation there, but that could explain the OP's findings, yes. – sticky bit Jan 09 '22 at 22:28
  • @stickybit - MySQL's Optimizer has many inadequacies; that may well be one of them. The original Optimizer code was designed for MyISAM, which did not have the "pointers" like InnoDB has. The code does keep improving; MariaDB and MySQL are playing tag; each is better than the other in some obscure corners. (Competition is "good" for both of them.) – Rick James Jan 10 '22 at 00:06
  • 1
    I can explain the details of many `SELECTs`, but I am unsure about what I am saying for this particular one. Because the Optimizer is still rather "simple-minded", I can _usually_ read a `SELECT` and predict how the query will be executed. Sometimes, I have to say "maybe this way; maybe that way". – Rick James Jan 10 '22 at 00:08

1 Answers1

0

The naughty LIKE

Maybe...

If you have a "category" and an optional "subcategory", then consider the following approach:

  • Any c_id starting with "[111]" belongs to category 111.
  • c_id = "[111][2]" refers to subcategory 2 in category 111.
  • That could be carried further with sub-subcategories, etc.

THEN, these become very efficient; note the lack of a leading wildcard:

WHERE c_id LIKE '[111]%'

WHERE c_id = '[111][2]'  -- if you knew it was two levels only, or

WHERE c_id LIKE '[111][2]%'  -- if you allow 'two or more' levels

However, asking for items with subcategory '[2]', would still be inefficient due to the leading wildcard:

WHERE c_id LIKE '%[2]%'

FYI, a LIKE without any wildcards optimizes as a simple =.

(You might consider the more traditional notation: "111/2".)

Why the JOIN helps in this case

As for why this can be better:

SELECT p.*
    FROM ( SELECT id ... FROM p WHERE ... ORDER BY ... LIMIT 3 ) AS a
    JOIN p  USING(id)

Note the LIMIT. (or GROUP BY.) The "derived table" (that subquery) may be optimizable and it delivers only 3 ids. Then the only 3 lookups into p for the outer SELECT is much less effort.

Without this contorted formulation, it may involve fetching p.* for lots of rows before filtering and sorting down to just 3 rows. That can be bulky and inefficient.

Without the subquery, it may have fetched all the LONGTEXT rows then kept only 3 of them. That is I/O was the villain. (Need to see the actual queries -- with and without join -- plus SHOW CREATE TABLE, to further explain my point.

Many-to-many

Yes, since many products map to many categories, do use a 'linking' table. A discussion of such a schema: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

Rick James
  • 135,179
  • 13
  • 127
  • 222