-1

I have a MySQL 5.6 database and fail to make a complex query, which (i think) involve multiple joins.

The first table (T1) is Hobby_ideas_articles and records looks like this:

hobby_idea_article_id= 1,
hobby_id = 6
url= 'http://exo.example.com',
author = 'john@example.com'

hobby_idea_article_id= 2,
hobby_id = 3
url= 'http://exo.example2.com',
author = 'john@example.com'

hobby_idea_article_id= 3,
hobby_id = 6
url= 'http://exo.example3.com',
author = 'eric@example.com'

hobby_idea_article_id= 4,
hobby_id = 7
url= 'http://exo.example4.com',
author = 'john@example.com'

hobby_idea_article_id= 5,
hobby_id = 6
url= 'http://exo.example5.com',
author = 'nestor@example.com'

hobby_idea_article_id= 7,
hobby_id = 6
url= 'http://exo.example6.com',
author = 'ginger@example.com'

hobby_idea_article_id= 8,
hobby_id = 6
url= 'http://exo.example8.com',
author = 'derek@example.com'

hobby_idea_article_id= 9,
hobby_id = 6
url= 'http://exo.example9.com',
author = 'derek@example.com'

The second table (T2) called Past_Customer_sent_messages lists ALL messages from the past, notably with the information of who sent it (customer_id -> pointing to another table called Customers which is not needed in this current question) and to whom (recipient), and records look like this

past_customer_sent_message_id = 5
hobby_id = 7,
customer_id = 4,
recipient = "john@example.com",
sent_at= "2019-09-10 00:00:00"

past_customer_sent_message_id = 6
hobby_id = 999,
customer_id = 4,
recipient = "eric@example.com",
sent_at= "2019-09-18 00:00:00"

past_customer_sent_message_id = 7
hobby_id = 999,
customer_id = 4,
recipient = "nestor@example.com",
sent_at= "2019-07-18 00:00:00"

Even if there is currently no foreign key between T1 and T2, one critical information is that the T1 'author' is the 'recipient" that will appear on T2 if a customer sends a message to him.

I'm not sure if it matters for the present question, but there is a Customer table (a past_customer_sent_messages belongs to a customer, a customer has multiple past_customer_sent_messages) and a Hobbies table (a Hobby_ideas_article belongs to a hobby but a hobby can have multiple Hobby_ideas_articles).

In my javascript file, before writing the SQL statement, I have 2 inputs with already given values I'll need to use at some point in the SQL statement:

  • hobby_id = HOBBY_ID_INPUT_I_HAVE , for example 6

  • customer_id = CUSTOMER_ID_INPUT_I_HAVE, for example 4

What I want is to retrieve ALL the records from the Hobby_ideas_articles table (T1) where hobby_id = HOBBY_ID_INPUT_I_HAVE with 2 conditions/restrictions:

  • condition1: the recipient (ex: 'john@example.com'), no matter WHAT (i.e no matter the hobby_id-S = plural it's connected to via 'author' column on T1, and no matter the customer_id-S = plural it's connected to via 'recipient' on T2), has never been sent an message (sent_at column on T2) in the past 3 hours. Again, this applies across all past messages to this 'recipient' (very important it means not only on the T1 'authors' that would match hobby_id = HOBBY_ID_INPUT_I_HAVE = 6).

  • condition2: the author has not been messaged in the past 30 days by the customer who has customer_id = CUSTOMER_ID_INPUT_I_HAVE. Here it applies to this specific customer (but across any hobby!).

A part I find particularly challenging intellectually as I would be tempted to put as a WHERE hobby_id = HOBBY_ID_INPUT_I_HAVE (see below) but it feels wrong because then, how to do I implement the condition1 which applies to ALL the messages, and should not be applied to only those with hobby_id = HOBBY_ID_INPUT_I_HAVE...hard! Same kind of difficulty for condition2

Maybe I could/should add a new column called hobby_idea_id that I would put on T2 which would be a foreign key between the 2 tables as a Past_Customer_sent_message belongs in a way to a Hobby_ideas_article and a Hobby_ideas_article has many Past_Customer_sent_messages ?

Today I have this but I don't even know on what to put the 'ON' clause of the join...and can't set the correct statement.

SELECT             
          hia.query_result_id,
          hia.url,
          hia.title,
          hia.article_publication_day,
          hia.media_name,
          hia.author_full_name,
          hia.author_first_name,
          hia.author_last_name,
          hia.author_email,
          ces.sent_at
FROM
          Hobby_ideas_articles hia
LEFT JOIN
          Past_Customer_sent_messages ces
ON
          # WHAT CLAUSE ???
WHERE
          hia.hobby_id = HOBBY_ID_INPUT_I_HAVE AND         
          hia.author IS NOT NULL
          # more conditions but how to assert them ??

How to achieve this ?

The desired output would be:

hobby_idea_article_id= 5,
hobby_id = 6
url= 'http://exo.example5.com',
author = 'nestor@example.com'
last_email_sent_at = "2019-07-18 00:00:00"

hobby_idea_article_id= 7,
hobby_id = 6
url= 'http://exo.example6.com',
author = 'ginger@example.com'
last_email_sent_at = null

hobby_idea_article_id= 8,
hobby_id = 6
url= 'http://exo.example8.com',
author = 'derek@example.com'
last_email_sent_at = null

As you can see with my inputs values given as examples (hobby_id = 6 , customer_id = 4),

  • 'john@example.com' and any hobby_ideas related to him are not part of the output because john has been messaged by customer_id = 4 in the past 30 days already

  • 'Eric' and the hobby_ideas related to him could have been part of the output because Eric was messaged by customer_id=6 8 days ago (and not customer_id=4 which is the value we target), BUT unfortunately it was excluded/filtered out because of condition1 : he was messaged at some point 2 hours ago by another customer, so he's out too.

  • nestor and the hobby_idea related to him are inside the output because he's unaffected by both because he indeed is related to hobby_id= 6, but his only mail was sent a few months ago.

  • ginger and the hobby_idea related to her are inside the output because condition1 and2 are ok given that ginger never emailed anybody: ther's no message at all where she is the recipient

  • Also note that if an author would be in the output but has multiple hobby_ideas related to him, then i only want one, that's why derek only appear once with let's say the FIRST hobby_idea but the second one he's one is removed from the output

Disclaimer: I come from a Rails background where I used to have ORM (Active Record) making all the chains/joins/ easier more automagic and am a bit lost here with the present SQL statement I must write inside my javascript aws lambda file.

Mathieu
  • 4,587
  • 11
  • 57
  • 112
  • The `ON` condition should presumably be `ON hia.hobby_id = ces.hobby_id`. – Barmar Sep 18 '19 at 23:42
  • Are you sure you want `LEFT JOIN`? That will return articles that don't have any messages. – Barmar Sep 18 '19 at 23:45
  • that's the core issue: I don't think so. I want all hobby ideas WHERE hobby_id is equal toa constant I have (let's say 6, that means this data does NOT come from T2) but...where I need ces.hobby_id would be more inside the WHEREs... – Mathieu Sep 18 '19 at 23:45
  • 1
    Could you show the desired result from the sample data? – Barmar Sep 18 '19 at 23:47
  • What does `hobby_id-S = plural` mean? – Barmar Sep 18 '19 at 23:48
  • i'm not sure it's a left join, i'm a sql rookie...maybe an inner join would be better. As my where clause does not involve 2 tables...maybe i dont need even a Join...but then how to make my condition1 and 2 without a Join, that's my issue...maybe the join could be on t1.author=t2.recipient? – Mathieu Sep 18 '19 at 23:48
  • @Barmar thanks for the help. I worte plural to make very clear there are multiple hobbies, hence hobby_id-s, related to a given author as an author can appear in multiple hobbiy_ideas – Mathieu Sep 18 '19 at 23:49
  • I'll write now the desired output – Mathieu Sep 18 '19 at 23:49
  • I guess it's `ON hia.author = ces.recipient`? – Barmar Sep 19 '19 at 00:01
  • Added more examples to show you based on my inputs(coming form my javascript calculations BEFORE my SQL statement, meaning they're available to use when doing the SQL) what would be the output – Mathieu Sep 19 '19 at 00:15
  • Also added notes to explain why each person is inside the output or left outside – Mathieu Sep 19 '19 at 00:16
  • The desired output doesn't have the `sent_at` column, which is the only reason for joining with `ces`. – Barmar Sep 19 '19 at 00:16
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199657/discussion-between-barmar-and-mathieu). – Barmar Sep 19 '19 at 00:17
  • yes it has sent_at...I do need some data from T2 inside the select.that's why i thought i need a join – Mathieu Sep 19 '19 at 00:18

1 Answers1

1

Use recipient NOT IN (SELECT ...) and author NOT IN (SELECT ...) to exclude the participants that match your conditions 1 and 2.

SELECT             
          hia.query_result_id,
          hia.url,
          hia.title,
          hia.article_publication_day,
          hia.media_name,
          hia.author_full_name,
          hia.author_first_name,
          hia.author_last_name,
          hia.author_email,
          ces.sent_at
FROM
          Hobby_ideas_articles hia
LEFT JOIN
          Past_Customer_sent_messages ces
ON
          hia.author = ces.recipient 
            AND ces.recipient NOT IN (
                SELECT recipient
                FROM Past_Customer_sent_messages
                WHERE sent_at > DATE_SUB(NOW(), INTERVAL 3 HOUR)
            )
WHERE
          hia.hobby_id = HOBBY_ID_INPUT_I_HAVE AND         
          hia.author IS NOT NULL
          AND hia.author NOT IN (
            SELECT recipient
            FROM Past_Customer_sent_messages
            WHERE customer_id = CUSTOMER_ID_INPUT_I_HAVE
            AND sent_at > DATE_SUB(NOW(), INTERVAL 30 DAY)
          )

The condition on ces has to be in the ON clause so that you'll get sent_at = NULL for articles that don't have any matching rows.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Added a response with full code here for discussion before editjng the post https://gist.github.com/misner/ef04a465434d544d89d146c95519b00f – Mathieu Sep 19 '19 at 10:06
  • Also created a new question with the basis of your answer to address a question regarding high concurrency and locks (beyond the simple select..for update): https://stackoverflow.com/questions/58012844/my-sql-5-6-how-to-prevent-reads-on-rows-selected-by-a-previous-sql-statement-p – Mathieu Sep 19 '19 at 14:06