I've been using a JPQL query like this:
SELECT s
FROM Sentence s
INNER JOIN s.words sw
WHERE s.date = :date
AND sw IN (:words)
GROUP BY s
HAVING count(sw) = :numberOfWords
In SQL s.words would be the linking table in the many to many :word is the collection of words we are looking to match :numberOfWords is the number of words in the collection
For those not familiar with JPQL, the similar SQL is found here: SQL query to find matches for multiple criteria
I'm running into a problem and I don't know if it's a design or a query problem.
So I have sentences (s) and words (w) and the sentence_word (sw) table joins the two. For example sentence "The lead dog of the group is blue" and word collection of [ the , cat ], this query will select this sentence as a match. It's a false positive because it contains "the" twice and does not contain "cat".
In SQL I think it comes out as:
SELECT s.id
FROM sentence s
JOIN sentence_word sw ON ( sw.s_id = s.id )
JOIN word w ON ( w.id = sw.w_id )
WHERE w.word IN ( 'the', 'cat' )
GROUP BY s.id
HAVING COUNT(1) = 2
The easiest solution I can think of is to just disallow duplicates in the linking table so instead of two links for the in this sentence, there is just one (thus the linking table stores that the sentence contains the word the, just not how many times). However, I think there will be times where I would want to search for a sentence that contains "the" many times so I would prefer not to use this option.
Help please!
Resources
Some good examples of similar searches and performance can be found here (thanks to the answerer for the link): How to filter SQL results in a has-many-through relation