0

There are 3 tables and I linked three tables with one query. But the problem is that the result returns in 7-8 seconds. One of the columns I join is JSON type and the data in it is json. For this reason, I did the join process with JSON. How can I make the following query faster? Or can I achieve the same result using another method?

Here is a table example for sales with json column:

sales table example

And the query:

SELECT
    `s`.`id` AS `id`,
    `s`.`status` AS `status`,
    group_concat(`c`.`display_name` separator ', ') AS `buyer`,
    `u`.`display_name` AS `partner`,
    `s`.`sales_price` AS `sales_price`,
FROM `sales` AS `s`
LEFT JOIN `users` AS `u` ON (`u`.`id` = `s`.`user_id`)
JOIN `contacts` AS `c` ON (json_contains(`s`.`buyers`, concat('"', `c`.`id`, '"')))
GROUP BY
    `s`.`id`

My tables:

SALES

column_name column_type
id integer
buyers varchar(255)
user_id integer
status integer
sales_price double(13,2)

CONTACTS

column_name column_type
id integer
display_name varchar(255)

USERS

column_name column_type
id integer
display_name varchar(255)

SALES.user_id => USERS.id SALES.buyers => CONTACTS.id

Sales.buyers can be multiple or single like this:

If the sale has one contact: ["774"] or If the sale has two contacts: ["774", "854"]

I want to see this output:

S.id S.status C.display_name AS BUYER U.display_name AS PARTNER S.sales_price
1 1 Michael Owen David Beckham 199999
2 1 Ariel Ortega, Mauro Icardi Leo Messi 219000
3 0 Nicholas Anelka, Didier Drogba Thierry Henry 710000

EXPLAIN result is here:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL 12455638
2 DERIVED s ALL 857 Using temporary; Using filesort
2 DERIVED u eq_ref PRIMARY PRIMARY 4 s.user_id 1
2 DERIVED c ALL 14533 Using where; Using join buffer (flat, BNL join)
3 UNION s2 ALL buyers_index 857 Using where
3 UNION u2 eq_ref PRIMARY PRIMARY 4 s2.user_id 1
UNION RESULT <union2,3> ALL
O. Jones
  • 103,626
  • 17
  • 118
  • 172
grudge
  • 11
  • 4
  • 2
    Please share more details, like the table structure and the execution plan for that query – Nico Haase Mar 01 '23 at 10:09
  • ``JOIN `contacts` AS `c` ON (json_contains(`s`.`buyers`, concat('"', `c`.`id`, '"')))`` - for starter, i'd question myself if i have to use json or just go with the good old one-to-many through foreign keys. – Bagus Tesa Mar 01 '23 at 10:12
  • 1
    Please share the execution plan of your query – Nico Haase Mar 01 '23 at 10:33
  • 1
    "EXPLAIN result is here: https://pastebin.com/PCvsiuRP" - please add it **directly** to your question, not to any external ressource. Also, please use the common tabular format, not `FORMAT=JSON` – Nico Haase Mar 01 '23 at 10:45
  • Ok @Nico Haase, i added it directly to my post. – grudge Mar 01 '23 at 11:13
  • [Why should I not upload images of code/data/errors when asking a question?](https://meta.stackoverflow.com/q/285551/3404097) – philipxy Mar 01 '23 at 15:23
  • Debug questions require a [mre]. Please clarify via edits, not comments. Put all & only what is needed to ask in your post, not just at a link. [ask] [Help] – philipxy Mar 01 '23 at 15:25
  • [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) [Tips for asking a good SQL question](https://meta.stackoverflow.com/q/271055/3404097) [Asking query performance questions](https://dba.meta.stackexchange.com/q/3034/43932) [How to Optimize Queries in a Database - The Basics](https://stackoverflow.com/q/3191623/3404097) [What are your most common sql optimizations?](https://stackoverflow.com/q/1332778/3404097) [When and why are database joins expensive?](https://stackoverflow.com/q/173726/3404097) etc etc – philipxy Mar 01 '23 at 15:27
  • [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/q/3653462/3404097) – philipxy Mar 01 '23 at 15:31
  • From my perspective your question is quite good (after you added some details). I hope to see more contributions from you here, @grudge. – O. Jones Mar 01 '23 at 22:13

1 Answers1

1

You have correctly identified the source of your performance trouble as the arrays of contacts.id values you store in your sales.status columns, in your case using the JSON format. SQL truly does not work well with data designs containing arrays in columns, whether JSON, comma-separated lists, or whatever.

Specifically, your ON (json_contains(s.buyers, concat('"', c.i, '"'))) condition cannot exploit any index -- it isn't sargable -- so it's inherently slow.

It looks like you have a many-to-many relationship between rows of sales and contacts. The SQLish way to express that is with a junction table. You might call it sales_buyers. It will contain rows like this.

sales_id   contact_id
   1       774
   2       774
   2       854

These sample rows mean sales.id 1 has a single buyer, contact.id 774. sales.id 2 has two buyers, 774 and 854.

Once you have that kind of junction table you can change your query to look like this:

...
FROM sales AS s
LEFT JOIN users AS u    ON u.id = s.user_id
JOIN sales_buyers AS sb ON s.id = sb.sales_id
JOIN contacts AS c      ON sb.contact_id = c.id 
...

(Notice also that you need GROUP BY s.id, u.display_name for your query to be correct.)

Your junction table will have this definition.

CREATE TABLE sales_buyers (
    sales_id INT NOT NULL,
    contact_id INT NOT NULL,
    PRIMARY KEY (sales_id, contact_id),
    INDEX contact_sales (contact_id, sales_id)
) ENGINE=InnoDB;

This table definition has indexes going both ways. That helps queries be faster whether you start with contacts or sales.

You create a sales - to - buyer relationship by INSERTing a row into this table, and remove a relationship by DELETEing the row.

Pro tip Skip the backticks. Don't use reserved words for column or table names and you don't need them. And they just make SQL harder to type and to read.

O. Jones
  • 103,626
  • 17
  • 118
  • 172