Problem
I have 2 tables:
Table tTag
idTag int
otherColumns
And
Table tTagWord
idTagWord int
idTag int
idWord int
position int
For example:
So each idTag will have multiple idTagWord (unknown number), the position is important too. I try to find the best way, for the performance, to find the duplicates.
A duplicate would be to have the same idWords in the same order (position) for 2 different idTag.
What I have tried
SELECT GROUP_CONCAT(DISTINCT tab.idTag SEPARATOR ',') INTO @idTagSet
FROM ( SELECT idTag,GROUP_CONCAT(idWord order by position ASC SEPARATOR ' ') AS Tag
FROM tTagWord
GROUP BY idTag) AS tab
INNER JOIN (SELECT idTag,GROUP_CONCAT(idWord order by position ASC SEPARATOR ' ') AS Tag
FROM tTagWord
GROUP BY idTag) AS tab2 ON tab.Tag = tab2.Tag
WHERE tab.idTag <> tab2.idTag;
The previous query returns a set of the duplicate idTags, so it works. But the performance is terrible. With 150 000 idTag, it already takes several minutes and the table will soon have millions of idTag.
I also tried something like this answer
select idTag, GROUP_CONCAT(idWord order by position ASC SEPARATOR '-') AS idWordSet
from tTagWord
group by idTag
Having COUNT(idWordSet) > 1;
But I can't seem to find a way. Any idea?