1

Overview

I'm simplifying the description of my application to make the query clearer. The application stores saved phrases. These phrases are each assigned a category. The phrases can be posted by users. An English description of my query:

I need to pass a category and user into the prepared statement. The result will include only phrases with the category passed as an argument. From the phrases of this category, only phrases will be returned where there does not exist any posted instances of this phrase for the user argument with the timePosted field in the last two weeks.

Minimal Schema

savedPhrase

  • phraseIdentifier - INT UNSIGNED
  • textContent - VARCHAR
  • associatedCategory - VARCHAR
  • ___primary key is phraseIdentifier

postedPhrase

  • savedPhrase - INT UNSIGNED - foreign key references savedPhrase(phraseIdentifier)
  • username - VARCHAR
  • timeReposted - TIMESTAMP
  • ___primary key is combination of all listed fields

Current Best Attempt with Behavior

I'm self-taught/learning MySQL and believe I struggle with joins. I think I should be using at least one left outer join as seen in the graphic below. The left table will be savedPhrase and the right will be postedPhrase.

Left Outer Join

SELECT * FROM savedPhrase S 
LEFT JOIN postedPhrase P
ON (S.phraseIdentifier = P.savedPhrase AND P.username = ?)
WHERE
(P.savedPhrase IS NULL OR P.timeReposted < DATE_SUB(NOW(), INTERVAL 14 day)) AND
S.associatedCategory = ?

The query above ALMOST works. However, once a single postedPhrase row exists at least two weeks ago, it will return the joined savedPhrase, even if one has been posted on the same account with the same identifier less than two weeks ago.

Further Discussion

The place where I'm having difficulty is the "absence condition". This is where a savedPhrase must not exist within the last two weeks. In the previous paragraph I explained how the absence is not functioning because the postedPhrases are being returned based on their own independent timeReposted. It seems I should be adding the absence condition to the right circle of the left outer join. Simplification of when to apply the conditions to the ON versus the WHERE would be very helpful.

I've worked at fixing this problem for a while and would appreciate any direction. Please let me know if I can include any other information.

Matt Goodrich
  • 4,875
  • 5
  • 25
  • 38
  • 1
    Most of those Venn diagram presentations are not illustrating what they claim to be. Left/right join returns what inner join returns plus unmatched left/right rows extended by nulls. Where happens after all joins. For a Venn diagram, the rows of inner join are in the intersection and the left/right circle is left/right join rows. So unmatched rows extended by nulls are in the left & right lobes. Full join is both circles. Re joins see [this answer](http://stackoverflow.com/a/25957600/3404097) and its links. Re querying see [this answer](http://stackoverflow.com/a/33952141/3404097). – philipxy May 18 '17 at 08:19
  • 1
    PS Your Venn diagram illustrates SQL EXCEPT aka DIFFERENCE, for two tables that have the same set of columns. Left/right circles are the input tables, left lobe the result. But left outer join is *part* of an *idiom* for expressing that. (You can also use NOT IN & NOT EXISTS. Google stackoverflow.) But the diagram is only obscurely related to the left join. And the code is even more obscurely related. – philipxy May 18 '17 at 08:32
  • @philipxy Thanks - I really appreciate all of the information! I will take some time to thoroughly read all of it before making any additional comments. – Matt Goodrich May 18 '17 at 08:52

1 Answers1

2

You don't have to to use JOIN to archive the result that you want. You could write the query base on the English description of the query which described on your question.

SELECT S.* 
FROM   savedPhrase S 
WHERE  S.associatedCategory = ? 
       AND NOT EXISTS (SELECT 1 
                       FROM   postedPhrase P 
                       WHERE  P.username = ? 
                              AND S.phraseIdentifier = P.savedPhrase 
                              AND P.timeReposted >= DATE_SUB(NOW(), 
                                                    INTERVAL 14 day))
Trung Duong
  • 3,475
  • 2
  • 8
  • 9
  • Perfect, thank you so much! I completely forgot about the NOT EXISTS keyword combination. I don't need to use a join, since I don't need any information from the `postedPhrase` table, right? – Matt Goodrich May 18 '17 at 08:50