Joining with a json column makes the result very slow

Question

There are 3 tables and I linked three tables with one query. But the problem is that the result returns in 7-8 seconds. One of the columns I join is JSON type and the data in it is json. For this reason, I did the join process with JSON. How can I make the following query faster? Or can I achieve the same result using another method?

Here is a table example for sales with json column:

sales table example

And the query:

SELECT
    `s`.`id` AS `id`,
    `s`.`status` AS `status`,
    group_concat(`c`.`display_name` separator ', ') AS `buyer`,
    `u`.`display_name` AS `partner`,
    `s`.`sales_price` AS `sales_price`,
FROM `sales` AS `s`
LEFT JOIN `users` AS `u` ON (`u`.`id` = `s`.`user_id`)
JOIN `contacts` AS `c` ON (json_contains(`s`.`buyers`, concat('"', `c`.`id`, '"')))
GROUP BY
    `s`.`id`

My tables:

SALES

column_name	column_type
id	integer
buyers	varchar(255)
user_id	integer
status	integer
sales_price	double(13,2)

CONTACTS

column_name	column_type
id	integer
display_name	varchar(255)

USERS

column_name	column_type
id	integer
display_name	varchar(255)

SALES.user_id => USERS.id SALES.buyers => CONTACTS.id

Sales.buyers can be multiple or single like this:

If the sale has one contact: ["774"] or If the sale has two contacts: ["774", "854"]

I want to see this output:

S.id	S.status	C.display_name AS BUYER	U.display_name AS PARTNER	S.sales_price
1	1	Michael Owen	David Beckham	199999
2	1	Ariel Ortega, Mauro Icardi	Leo Messi	219000
3	0	Nicholas Anelka, Didier Drogba	Thierry Henry	710000

EXPLAIN result is here:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY		ALL					12455638
2	DERIVED	s	ALL					857	Using temporary; Using filesort
2	DERIVED	u	eq_ref	PRIMARY	PRIMARY	4	s.user_id	1
2	DERIVED	c	ALL					14533	Using where; Using join buffer (flat, BNL join)
3	UNION	s2	ALL	buyers_index				857	Using where
3	UNION	u2	eq_ref	PRIMARY	PRIMARY	4	s2.user_id	1
	UNION RESULT	<union2,3>	ALL

Please share more details, like the table structure and the execution plan for that query — Nico Haase, Mar 01 '23 at 10:09
``JOIN `contacts` AS `c` ON (json_contains(`s`.`buyers`, concat('"', `c`.`id`, '"')))`` - for starter, i'd question myself if i have to use json or just go with the good old one-to-many through foreign keys. — Bagus Tesa, Mar 01 '23 at 10:12
"EXPLAIN result is here: https://pastebin.com/PCvsiuRP" - please add it **directly** to your question, not to any external ressource. Also, please use the common tabular format, not `FORMAT=JSON` — Nico Haase, Mar 01 '23 at 10:45
[Why should I not upload images of code/data/errors when asking a question?](https://meta.stackoverflow.com/q/285551/3404097) — philipxy, Mar 01 '23 at 15:23
Debug questions require a [mre]. Please clarify via edits, not comments. Put all & only what is needed to ask in your post, not just at a link. [ask] [Help] — philipxy, Mar 01 '23 at 15:25
[How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) [Tips for asking a good SQL question](https://meta.stackoverflow.com/q/271055/3404097) [Asking query performance questions](https://dba.meta.stackexchange.com/q/3034/43932) [How to Optimize Queries in a Database - The Basics](https://stackoverflow.com/q/3191623/3404097) [What are your most common sql optimizations?](https://stackoverflow.com/q/1332778/3404097) [When and why are database joins expensive?](https://stackoverflow.com/q/173726/3404097) etc etc — philipxy, Mar 01 '23 at 15:27
[Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/q/3653462/3404097) — philipxy, Mar 01 '23 at 15:31
From my perspective your question is quite good (after you added some details). I hope to see more contributions from you here, @grudge. — O. Jones, Mar 01 '23 at 22:13

score 1 · Accepted Answer · answered Mar 01 '23 at 12:00

You have correctly identified the source of your performance trouble as the arrays of contacts.id values you store in your sales.status columns, in your case using the JSON format. SQL truly does not work well with data designs containing arrays in columns, whether JSON, comma-separated lists, or whatever.

Specifically, your ON (json_contains(s.buyers, concat('"', c.i, '"'))) condition cannot exploit any index -- it isn't sargable -- so it's inherently slow.

It looks like you have a many-to-many relationship between rows of sales and contacts. The SQLish way to express that is with a junction table. You might call it sales_buyers. It will contain rows like this.

sales_id   contact_id
   1       774
   2       774
   2       854

These sample rows mean sales.id 1 has a single buyer, contact.id 774. sales.id 2 has two buyers, 774 and 854.

Once you have that kind of junction table you can change your query to look like this:

...
FROM sales AS s
LEFT JOIN users AS u    ON u.id = s.user_id
JOIN sales_buyers AS sb ON s.id = sb.sales_id
JOIN contacts AS c      ON sb.contact_id = c.id 
...

(Notice also that you need GROUP BY s.id, u.display_name for your query to be correct.)

Your junction table will have this definition.

CREATE TABLE sales_buyers (
    sales_id INT NOT NULL,
    contact_id INT NOT NULL,
    PRIMARY KEY (sales_id, contact_id),
    INDEX contact_sales (contact_id, sales_id)
) ENGINE=InnoDB;

This table definition has indexes going both ways. That helps queries be faster whether you start with contacts or sales.

You create a sales - to - buyer relationship by INSERTing a row into this table, and remove a relationship by DELETEing the row.

Pro tip Skip the backticks. Don't use reserved words for column or table names and you don't need them. And they just make SQL harder to type and to read.

Joining with a json column makes the result very slow

1 Answers1