Assuming your table_item
looks like this:
create table table_item (
item_id int unsigned auto_increment not null,
record varchar(50),
primary key (item_id)
);
insert into table_item (record) values
('Record A'),
('Record B'),
('Record C'),
('Record D'),
('Record E'),
('Record F'),
('Record G'),
('Record H');
table_item_linked
could then be
create table table_item_linked (
linked_id int unsigned auto_increment not null,
item1_id int unsigned not null,
item2_id int unsigned not null,
linked_by int unsigned not null,
linked_timestamp timestamp not null default now(),
primary key (linked_id),
unique key (item1_id, item2_id),
index (item2_id, item1_id),
foreign key (item1_id) references table_item(item_id),
foreign key (item2_id) references table_item(item_id)
);
This is basically a many-to-many relation between items of the same type.
Note that you usually don't need an AUTO_INCREMENT column here. You can remove it, and define (item1_id, item2_id)
as PRIMARY KEY. And linked_by
should be a FOREGN KEY referencing the users
table.
If a user (with ID 123
) wants to link "Record A" (item_id = 1
) with "Record B" (item_id = 2
) and "Record B" (item_id = 2
) with "Record C" (item_id = 3
), your INSERT statements would be:
insert into table_item_linked (item1_id, item2_id, linked_by) values (1, 2, 123);
insert into table_item_linked (item1_id, item2_id, linked_by) values (2, 3, 123);
Now - When the user selects "Record A" (item_id = 1
), you can get all related items with a recursive query (Requires at least MySQL 8.0 or MariaDB 10.2):
set @input_item_id = 1;
with recursive input as (
select @input_item_id as item_id
), rcte as (
select item_id from input
union distinct
select t.item2_id as item_id
from rcte r
join table_item_linked t on t.item1_id = r.item_id
union distinct
select t.item1_id as item_id
from rcte r
join table_item_linked t on t.item2_id = r.item_id
)
select i.*
from rcte r
join table_item i on i.item_id = r.item_id
where r.item_id <> (select item_id from input)
The result will be:
item_id record
———————————————————
2 Record B
3 Record C
db-fiddle
In your application you would remove set @input_item_id = 1;
and change select @input_item_id as item_id
using a placeholder to select ? as item_id
. Then prepare the statement and bind item_id
as parameter.
Update
If the server doesn't support recursive CTEs, you should consider to store redundat data in a separate table, which is simple to query. A closure table would be an option, but it's not necessery here, and might consume too much storage space. I would group items that are connected together (directly and indirectly) into clusters.
Given the same schema as above, we define a new table table_item_cluster
:
create table table_item_cluster (
item_id int unsigned not null,
cluster_id int unsigned not null,
primary key (item_id),
index (cluster_id, item_id),
foreign key (item_id) references table_item(item_id)
);
This table links items (item_id
) to clusters (cluster_id
). Since an item can belong only to one cluster, we can define item_id
as primary key. It's also a foreign key referencing table_item
.
When a new item is created, it's not connected to any other item and builds an own cluster. So when we insert a new item, we need also to insert a new row in table_item_cluster
. For simplicity we identify the cluster by item_id
(item_id = cluster_id
). This can be done in the application code, or with the following trigger:
delimiter //
create trigger table_item_after_insert
after insert on table_item
for each row begin
-- create a new cluster for the new item
insert into table_item_cluster (item_id, cluster_id)
values (new.item_id, new.item_id);
end//
delimiter ;
When we link two items, we simply merge their clusters. The cluster_id
for all items from the two merged clusters needs to be the same now. Here I would just take the least one of two. Again - we can do that in application code or with a trigger:
delimiter //
create trigger table_item_linked_after_insert
after insert on table_item_linked
for each row begin
declare cluster1_id, cluster2_id int unsigned;
set cluster1_id = (
select c.cluster_id
from table_item_cluster c
where c.item_id = new.item1_id
);
set cluster2_id = (
select c.cluster_id
from table_item_cluster c
where c.item_id = new.item2_id
);
-- merge the linked clusters
update table_item_cluster c
set c.cluster_id = least(cluster1_id, cluster2_id)
where c.item_id in (cluster1_id, cluster2_id);
end//
delimiter ;
Now - When we have an item and want to get all (directly and indirectly) linked items, we just select all items (except of the given item) from the same cluster:
select i.*
from table_item i
join table_item_cluster c on c.item_id = i.item_id
join table_item_cluster c1
on c1.cluster_id = c.cluster_id
and c1.item_id <> c.item_id -- exclude the given item
where c1.item_id = ?
db-fiddle
The result for c1.item_id = 1
("Record A") would be:
item_id record
———————————————————
2 Record B
3 Record C
But: As almost always when dealing with redundant data - Keeping it in sync with the source data can get quite complex. While it is simple to add and merge clusters - When you need to remove/delete an item or a link, you might need to split a cluster, which may require writing recursive or iterative code to determine which items belong to the same cluster. Though a simple (and "stupid") algorithm would be to just remove and reinsert all affected items and links, and let the insert triggers do theit work.
Update 2
Last but not least: You can write a stored procedure, which will iterate through the links:
delimiter //
create procedure get_linked_items(in in_item_id int unsigned)
begin
set @ids := concat(in_item_id);
set @ids_next := @ids;
set @sql_tpl := "
select group_concat(distinct id order by id) into @ids_next
from (
select item2_id as id
from table_item_linked
where item1_id in ({params_in})
and item2_id not in ({params_not_in})
union all
select item1_id
from table_item_linked
where item2_id in ({params_in})
and item1_id not in ({params_not_in})
) x
";
while (@ids_next is not null) do
set @sql := @sql_tpl;
set @sql := replace(@sql, '{params_in}', @ids_next);
set @sql := replace(@sql, '{params_not_in}', @ids);
prepare stmt from @sql;
execute stmt;
set @ids := concat_ws(',', @ids, @ids_next);
end while;
set @sql := "
select *
from table_item
where item_id in ({params})
and item_id <> {in_item_id}
";
set @sql := replace(@sql, '{params}', @ids);
set @sql := replace(@sql, '{in_item_id}', in_item_id);
prepare stmt from @sql;
execute stmt;
end//
delimiter ;
To get all linked items of "Record A" (item_id = 1
), you would use
call get_linked_items(1);
db-fiddle
To explain it in pseudo code:
- Initialize
@ids
and @ids_next
with the input parameter
- Find all item IDs which are directly linked to any ID in
@ids_next
except of those, which are already in @ids
- Store the result into
@ids_next
(overwrite it)
- Append IDs from
@ids_next
to @ids
(merge the two sets into @ids
)
- If
@ids_next
is not empty: GOTO step 2.
- Return all items with IDs in
@ids