8

Check out this SQL Fiddle for a simplified version of my issue http://sqlfiddle.com/#!9/cf31d3/1

I have 2 tables - chat messages and chat recipients that look like this:

enter image description here

Sample ChatMessages data:

enter image description here

Sample ChatRecipients data:

enter image description here

Basically I want to query only messages that contain a set of user IDs - for example, show only messages exchanged between Bob, Susan, and Chelsea. If I pull up a new chat window with user IDs (1, 2, 3) what is the best way to get messages ONLY involving those 3 people?

Here's a simplified version of my current query (which does not produce the correct result):

SELECT
  cm.message_id as 'message_id',
  cm.from_id    as 'from_id',
  (SELECT u.user_fname as 'fname' from Users u where u.user_id = cm.from_id) as 'firstName',
  (SELECT u.user_lname as 'lname' from Users u where u.user_id = cm.from_id) as 'lastName',
  cm.chat_text  as 'chat_text'
FROM
  ChatMessages cm
INNER JOIN
  ChatRecipients cr
ON
  cm.message_id = cr.message_id
INNER JOIN
  Users u
ON
  cm.from_id = u.user_id
WHERE
  cm.from_id in ('1', '2', '3')
AND
  cr.user_id in ('1', '2', '3')

I understand that using the 'IN' operator is not correct for this situation, but I'm a bit stuck. Thanks to anyone willing to help!

EDIT:

My sample output returns every row of data that any of the aforementioned user IDs are contained in and looks like this:

enter image description here

My goal is to limit the output to only messages where EVERY user ID I test for is associated with a message_id. For example, if message_id 32 is FROM user_id 7 and TO user_id(s) 11 & 3, I want to retrieve that record. Conversely, if message_id 33 is FROM user_id 7 and to user_id(s) 11 & 4 I do not want to retrieve that record.

Robert
  • 981
  • 1
  • 15
  • 24
  • 1. why do you use JOIN ON instead of a simple FROM ChatMessages cm, ChatRecipients cr, Users u WHERE cm.message_id = cr.message_id AND cm.from_id = u.user_id AND ... your where...? 2. what "which doesn't produce the correct result" means? What is expected, and what do you get? – Gavriel Jan 21 '16 at 02:09
  • Hey @Gavriel thanks for the reply. I need to inner join each of these tables to limit the output to data that meets my criteria in all 3 tables. In my output, I currently get all messages that contain ANY of the user IDs mentioned because I'm using the 'IN' statement. I can post sample output if that would be helpful. – Robert Jan 21 '16 at 02:14
  • please add the examples, because we don't read your mind (what should be the correct output)... – Gavriel Jan 21 '16 at 02:19
  • Ok cool I will post sample vs desired output. Writing it up now... – Robert Jan 21 '16 at 02:23
  • Count how many distinct people are in a conversation and compare that to how many relevant people are in the conversation – Strawberry Jan 23 '16 at 09:33

7 Answers7

5

The problem here is that your message must either be:

  • from user 1 and received by 2, 3, ...N
  • from user 2 and received by 1, 3, ...N
  • ...
  • from user N and received by 1, 2, ...N-1

and you need a query capable of scaling reasonably, i.e., no single JOIN for every recipient or things like that.

Let's start with the "from" part.

SELECT m.* FROM ChatMessages AS m
    WHERE from_id IN ($users)

Now I need to know what recipients these messages have.

SELECT m.* FROM ChatMessages AS m
    JOIN ChatRecipients AS r ON (m.message_id = r.message_id)
    WHERE from_id IN ($users)

Recipients may be good or bad and I'm interested in how many they are. So

SELECT m.*,
    COUNT(*) AS total,
    SUM(IF(user_id IN ($users), 1, 0)) AS good
FROM ChatMessages AS m
    JOIN ChatRecipients AS r ON (m.message_id = r.message_id)
    WHERE from_id IN ($users)
GROUP BY m.message_id;

Finally

A message is acceptable if it's between my [1...N] users, which means that it has exactly N-1 recipients, N-1 of them good.

SELECT m.*,
    COUNT(*) AS total,
    SUM(IF(user_id IN ({$users}), 1, 0) AS good
FROM ChatMessages AS m
    JOIN ChatRecipients AS r ON (m.message_id = r.message_id)
    WHERE from_id IN ({$users})
GROUP BY m.message_id
HAVING total = good AND good = {$n}

Test

In this case with three id's we have $users = 1,2,3 and $n = 2

SELECT m.*,
    COUNT(*) AS total,
    SUM(IF(user_id IN (1,2,3), 1, 0)) AS good
FROM ChatMessages AS m
    JOIN ChatRecipients AS r ON (m.message_id = r.message_id)
    WHERE from_id IN (1,2,3)
GROUP BY m.message_id
HAVING total = good AND good = 2


message_id  from_id     chat_text
1           2           Message from Susan to Bob and Chelsea
2           3           Message from Chelsea to Bob and Susan
3           1           Message from Bob to Chelsea and Susan
LSerni
  • 55,617
  • 10
  • 65
  • 107
  • This is the best solution posted, thank you Iserni. The real key to this problem, I believe, is counting the total number of recipients and checking to make sure the total number of recipients that matched the user's we're checking for match. Thanks a lot, you earned the bounty - I'm getting a message saying I have to wait 8 hours though, so I'll come back and award the points tonight. – Robert Jan 23 '16 at 16:26
1

add:

'GROUP BY message_id HAVING COUNT(DISTINCT cr.user_id)=2'

The general case in php instead of 2: count($otherUserIds)

See it in action: http://sqlfiddle.com/#!9/bcf1b/13 See also some explanation: Matching all values in IN clause

Community
  • 1
  • 1
Gavriel
  • 18,880
  • 12
  • 68
  • 105
  • I'll check out the link. If I return values based on the COUNT of other people in the chat I'll get many undesired values though. I'm trying to communicate that I need only messages that have been exchanged amongst a particular group of people. – Robert Jan 21 '16 at 02:51
  • I told you should ADD this to your query. This will have less results than yours. And if I'm right it'll be what you want – Gavriel Jan 21 '16 at 02:53
  • Unfortunately this won't work for my needs - I do appreciate the effort though. – Robert Jan 21 '16 at 04:30
  • Ok I think my question is quite clear, but I'll try to restate it more simply. Bob, Susan, and Chelsea are all in the same chat together. Later on Bob pulls up a new chat windows and sends a message to Susan and Chelsea. I need to be able to retrieve past messages that included only those 3 people. The reason that counting other recipients wouldn't work is because I could then retrieve every time Bob chatted with Tim and Jack or every time Susan chatted with Ryan and Danielle. Does that make sense? – Robert Jan 21 '16 at 17:12
  • Please read the answer carefully again, then instead of guessing, try the code, and if it doesn't give the expected results then let us know. Not that "you think it won't work", nor "it didn't work" won't help us. Exact input, output please! – Gavriel Jan 21 '16 at 17:16
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/101326/discussion-between-gavriel-and-robert). – Gavriel Jan 21 '16 at 17:18
1

Answering your question:

If I pull up a new chat window with user IDs (1, 2, 3) what is the best way to get messages ONLY involving those 3 people?

You can use the following query:

SELECT q_ur.user_fname, q_ur.user_lname, q_cm.chat_text
         FROM Users q_ur INNER JOIN 
              ChatMessages q_cm
           ON q_ur.user_id = q_cm.from_id
WHERE q_cm.message_id in (
SELECT cr.message_id FROM ChatMessages cm INNER JOIN 
              ChatRecipients cr
         ON cm.message_id = cr.message_id
    WHERE cm.from_id IN (1,2,3)
      AND cr.user_id IN (1,2,3)
group by cr.message_id
having count(*) = 2)

Expressions: cm.from_id IN (1,2,3) AND cr.user_id IN (1,2,3) filter messages related to people in the same chat. To filter messages related to people 1 <-> 2 and 1 <-> 3 and 2 <-> 3 I have user having count(*) = 2. 2 used for filtering all messages whose destinations < or > then number of people in chat - 1.

So to use this query you have to specify two parameters (in three places): the first parameter is ID of people in the same chat, the second is the number of people - 1 in this chat.

And you will not retrieve other charts with three people in them where only one of this (1,2,3) triple are participating. To make sure checkout following link:

SQL Fiddle to test query.

Mikhailov Valentin
  • 1,092
  • 3
  • 16
  • 23
  • Hey, I found the problem in my approach after you commented. (I deleted that). Also I see your approach is better than that. Only one suggestion from me to make it accurate as OP requires. See this fiddle http://sqlfiddle.com/#!9/756e2/4, where message 1 shouldn't be included because it involves user 4 also as receiver. So just add an extra condtion `AND NOT EXISTS(select NULL FROM ChatRecipients WHERE NOT user_id IN (1,2,3) AND message_id = cm.message_id)`. Fiddle - http://sqlfiddle.com/#!9/756e2/5 – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Jan 23 '16 at 08:57
  • @TaReQ MahMooD, thanks for the suggestion, but from my point of view message ONE from your example, shouldn't be included in the result. – Mikhailov Valentin Jan 23 '16 at 10:21
  • Yeah, that's what I'm trying to say, but your query will include that. See both my fiddle and you will know the difference. – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Jan 23 '16 at 10:27
1

For this type of scenarios, I would better suggest a different type of database structure to create a message thread with all the users involved and instead of connecting each messages to each users, connect them to the threads. Here are the sample tables:

MessageThreads

| thread_id | created_at          | 
-----------------------------------
|      1    | 2016-01-20 18:24:36 |
|      2    | 2016-01-20 19:24:24 |

ThreadRecipients

| thread_id |  user_id  |  last_read_message  | 
-----------------------------------------------
|      1    |      1    |        2            |
|      1    |      2    |        3            |
|      1    |      3    |        1            |

ChatMessages (like before)

| message_id |  from_id  |  chat_text         |     chat_datetime   |
---------------------------------------------------------------------
|      1    |      1    |        Test         | 2016-01-20 18:24:36 |
|      1    |      1    |        Test2        | 2016-01-20 19:24:36 |
|      1    |      2    |        Test3        | 2016-01-20 19:34:36 |

ThreadMessages

| thread_id |  message_id |
---------------------------
|      1    |      1      |
|      1    |      2      |
|      1    |      3      |

Here, inplace of isRead field in your ChatRecipients table, I have used last_read_message in ThreadRecipients table, where you can always update with the last seen message of the user in the thread. But if you want to still keep a such track of each message to each user, you can still have another table with only message_id and user_id where data will be inserted only if the message is read by the user. (And You can still use your ChatRecipients table for one to one messages, if you don't want to create thread in that case. )

Why It's necessary

It's because if you use ChatRecipients table you are adding multiple rows to ChatRecipients table for each message, in the long run its going to cost you some extra space. But if you use ThreadMessages as I suggested, you will put only one row per message in ThreadMessages and users will be connected to threads through ThreadRecipients table which will be one row per user per thread.

For example, if you have thread of 100 users with 50 messages, in your approach you will have 50 x 100 rows in ChatRecipients table. But with this approach, it will be 100 rows in ThreadRecipients table and 50 rows in ThreadMessages table. Just think about the difference.

How to Insert Data

So, when you have a new message thread between a group of persons. As for your example, we have three users with ID 1, 2, 3.

  1. Insert a new thread to ThreadRecipients table. Get the new thread_id. (It can be an auto incremented value)
  2. Now for each associated user_id, insert one row in ThreadRecipients table. For example, We have thread_id 3 and user_id 1, 2 , 3.

    INSERT INTO ThreadRecipients (thread_id, user_id) VALUES(3, 1), (3, 2), (3, 3)
    
  3. Now, when any person send messages to the thread, just insert the row to ChatMessages table (like before), get the message_id and Insert a new row to ThreadMessages with thread_id and message_id. For example our message_id = 9.

    INSERT INTO ThreadMessages (thread_id, message_id) VALUES(3, 9)
    
  4. When anyone reads the message, just update the last_read_message for the user in the ThreadRecipients table with the read message_id (the condition last_read_message < 3 makes sure that, the message you are updating with isn't older than the existing last_read_message).

    UPDATE ThreadRecipients SET last_read_message = 3 WHERE user_id = 2 AND thread_id = 3 AND last_read_message < 3 
    

Note: Always before inserting new thread, check if a thread already exists with the same users so that you don't have duplicate thread for the same group of users. (See below for how to find existing thread for specific users).

How to Get Messages

Now, your query should only check if there is a thread involving the specific users and no other users are involved in the thread. So, in WHERE clause

  1. First we have a sub query SELECT COUNT(*) FROM ThreadRecipients WHERE user_id in ('1', '2', '3') AND thread_id = tm.thread_id) and we are checking if that equals 3. It will be 4, if the number of users is 4 and so on. (Keep a UNIQUE key for thread_id + user_id, so that there can never be a data duplication and get an incorrect count match thereby).

  2. The other condition makes sure, there is no other user involved so we are just checking if any row exists WHERE NOT user_id IN ('1', '2', '3') AND thread_id = tm.thread_id). If exists, we will just consider it as another thread involving more persons.

So, finally the query can be like this : (See the SQL Fiddle)

SELECT
    cm.message_id as 'message_id',
    cm.from_id    as 'from_id',
    (SELECT u.user_fname as 'fname' from Users u where u.user_id = cm.from_id) as 'firstName',
    (SELECT u.user_lname as 'lname' from Users u where u.user_id = cm.from_id) as 'lastName',
    cm.chat_text  as 'chat_text'
FROM
    ChatMessages cm
INNER JOIN
    ThreadMessages tm
ON
    cm.message_id = tm.message_id
INNER JOIN
    Users u
ON
    cm.from_id = u.user_id
WHERE
    (SELECT COUNT(*) FROM ThreadRecipients WHERE user_id in ('1', '2', '3') AND thread_id = tm.thread_id) = 3
    AND NOT EXISTS(select NULL FROM ThreadRecipients WHERE NOT user_id IN ('1', '2', '3') AND thread_id = tm.thread_id)
  • Thanks for the answer @TaReQ. I'm actually a bit confused about why you would replace the ChatRecipients table with a ThreadMessages table, and I was unable to get your query return results between just 2 users, like you would need to do when only users 1 and 2 were having a conversation. – Robert Jan 23 '16 at 16:45
  • Sorry for not mentioning that you need to change the counter also when number of users change. So when you have two user it should be like `(SELECT COUNT(*) FROM ThreadRecipients WHERE user_id in ('1', '2') AND thread_id = tm.thread_id) = 2`. – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Jan 23 '16 at 16:52
  • I did try that, but still wasn't able to produce the correct result when only users 1 & 2 were involved. – Robert Jan 23 '16 at 16:54
  • Regarding change of table, it's because if you use ChatRecipients table you are adding multiple rows to ChatRecipients table for each message, in the long run its going to cost you some extra space. But if you use ThreadMessages as I suggested, you will put only one row per message in ThreadMessages and users will be connected to threads through ThreadRecipients table which will be once per thread. – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Jan 23 '16 at 17:00
  • For example, if you have thread of 100 users with 50 messages, in your approach you will have 50 x 100 rows in ChatRecipients table. But with my approach, it will be 100 rows in ThreadRecipients table and 50 rows in ThreadMessages table. Just think about the difference. – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Jan 23 '16 at 17:00
  • 1
    I will create a fiddle with your 2 users example with more explanations to the answer after 8 hours. (I am just in bed and almost ready for sleep :) ) Not sure if you are clear about my approach yet. – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Jan 23 '16 at 17:07
  • @Robert Here is the fiddle for your 2 user example http://sqlfiddle.com/#!9/58bc3/2 and check above for more explanations of the approach. – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Jan 24 '16 at 01:36
1

Your reasoning seems sound. I have a simplified version of your query which seems to work:

SELECT 
  ChatMessages.message_id,
  ChatMessages.from_id,
  Users.user_fname,
  Users.user_lname,
  ChatMessages.chat_text,
  ChatRecipients.user_id as 'to_id'
FROM ChatMessages
INNER JOIN Users
ON ChatMessages.from_id=Users.user_id
INNER JOIN ChatRecipients
ON ChatRecipients.message_id=ChatMessages.message_id
WHERE ChatMessages.from_id IN (1, 3, 4)
AND ChatRecipients.user_id IN (1, 3, 4);

Check the SQLFiddle here to see it working. Your use of the IN clause is fine, but you shouldn't be putting the quotes in there because it is an integer, not a string you are matching.

HenryTK
  • 1,287
  • 8
  • 11
  • Thanks for the answer, Henry. Unfortunately, this solution produces incorrect results. For example, if you query using user_id (1,3) you'll pull up messages sent by 1 or 3 in group chats to multiple people where we only wanted messages exclusively between those 2 users. Check out @Iserni's answer for the correct solution. – Robert Jan 23 '16 at 16:31
0

You can try this

SqlFiddle Demo

SELECT 
cm.message_id as 'message_id',
cm.from_id as FromID,
cr.user_id as ToID,
(SELECT CONCAT(user_fname," ",user_lname) from Users where Users.user_id=cm.from_id ) as 'sender_name',
(SELECT CONCAT(user_fname," ",user_lname) from Users where Users.user_id=cr.user_id ) as 'recipient_name',
cm.chat_text  as 'chat_text'
FROM ChatRecipients cr
INNER JOIN ChatMessages cm ON cm.message_id = cr.message_id  
WHERE cr.user_id in (1, 2, 3)
and cm.from_id in (1, 2, 3)
GROUP BY cr.user_id
HAVING COUNT(cr.user_id)>=2
Uttam Kumar Roy
  • 2,060
  • 4
  • 23
  • 29
  • 1
    Thanks for the answer @Uttam. This query will not return the correct data when you only have 2 people messaging each other. Check out Iserni's answer for the correct solution. – Robert Jan 23 '16 at 16:48
  • Thanks for the kind reply and thank you for your help! – Robert Jan 23 '16 at 17:11
0

Thanks to everyone who provided an answer. @Iserni has answered my question correctly I believe, although I do think the second argument in the WHERE clause as I've posted below is necessary. None of the test cases in my SQL Fiddle example would have caused Iserna's query to produce an incorrect result though, so that's on me.

I was actually able to solve my problem a few hours prior to seeing Iserna's solution, so I figured I'd post what worked for me in case it can help anyone:

SELECT
  cm.message_id as 'message_id',
  cm.from_id    as 'from_id',
  (SELECT u.user_fname as 'fname' from Users u where u.user_id = cm.from_id) as 'firstName',
  (SELECT u.user_lname as 'lname' from Users u where u.user_id = cm.from_id) as 'lastName',
  cm.chat_text  as 'chat_text',
  (SELECT COUNT(DISTINCT cr.user_id) as 'uid' FROM ChatRecipients cr WHERE cr.message_id = cm.message_id) as 'countDistinct'
FROM
  ChatMessages cm
INNER JOIN
  ChatRecipients cr
ON
  cm.message_id = cr.message_id
INNER JOIN
  Users u
ON
  cm.from_id = u.user_id
WHERE
  cm.from_id in ('1', '2', '3')
AND
  cr.user_id in ('1', '2', '3')
GROUP BY
  cm.message_id
HAVING
  countDistinct = 2
AND
  COUNT(DISTINCT cr.user_id) = 2

They key to solving this problem is that you must count the number of distinct message recipients, which must be equal to N-1 of the total number of people involved in the message. You must also count the number of user_id's that you supply the query and make sure that that you only get values for messages intended for N-1 of the users represented. That sort of double checking logic makes this problem somewhat difficult.

Here's what this query looks like in a real scenario with dynamic input if anyone is interested.

SELECT
   DISTINCT cm.message_id as 'message_id',
   cm.from_id    as 'from_id',
   (SELECT u.user_fname as 'fname' from Users u where u.user_id = cm.from_id) as 'firstName',
   (SELECT u.user_lname as 'lname' from Users u where u.user_id = cm.from_id) as 'lastName',
   cm.chat_text  as 'chat_text',
   cm.chat_datetime as 'datetime',
   (SELECT COUNT(DISTINCT cr.user_id) as 'uid' FROM ChatRecipients cr WHERE cr.message_id = cm.message_id) as 'countDistinct'
FROM
   ChatMessages cm
INNER JOIN
   ChatRecipients cr
ON
   cm.message_id = cr.message_id
INNER JOIN
   Users u
ON
   cm.from_id = u.user_id
WHERE
   cm.from_id in ('$tempUid', '". implode("','", array_map('trim', $otherUserIds)) ."')
AND
   cr.user_id in ('$tempUid', '". implode("','", array_map('trim', $otherUserIds)) ."')
GROUP BY
   cm.message_id
HAVING
   countDistinct = ". count($otherUserIds) ."
AND
   COUNT(DISTINCT cr.user_id) = ". count($otherUserIds) ."
ORDER BY
   cm.chat_datetime DESC
LIMIT
   $paginationConstant OFFSET $offsetVal
Robert
  • 981
  • 1
  • 15
  • 24