37

I am creating a threaded message system much like gmail and facebook where the inbox lists the most recent threads displaying the subject, the sender's name, and time-stamp of the most recent message.

Here's How my tables are set up:

users:
    user_id
    user_name

thread:
    thread_id
    title
    to_id
    to_keep
    to_read
    from_id
    from_keep
    date

message:
    message_id
    thread_id
    to_id
    from_id
    message_text
    date

What I'm doing right now is when a user creates a new message, it creates a new thread in the thread table and then a new message in the message table and if a user responds to a thread, it duplicates the current thread in the thread table except it swaps the to_id and from_id and then creates a new message based on that.

Also, for the inbox view, I'm able to just query all threads based on a user_id. so something like SELECT * FROM thread WHERE to_id = 2 and to_keep = TRUE ORDER BY date DESC or if I want to view messages in the outbox it would be something like SELECT * FROM thread WHERE from_id = 2 and from_keep = TRUE ORDER BY date DESC.

If a user opens a thread when there's a new message, then to_read is updated to true UPDATE thread SET to_read = TRUE WHERE thread_id = 4.

I feel like I'm over complicating this process and that there should be a better way to do this.

Any help or ideas would be appreciated.

This way let's me just select everything from the thread table and then do a join with the user table to display everything I need. However I feel like there should be a better way to do this.

bigmike7801
  • 3,908
  • 9
  • 49
  • 77
  • 3
    Why does the thread need a to_id and a from_id? The message is the thing which binds together a sender and a receiver. The thread is just the container for the messages. Or am I misunderstanding? – tom redfern Jun 21 '11 at 07:18
  • No, you're correct. I was just having a difficult time with my queries selecting all threads that should be in the inbox and just the info from just the most recent message in the thread. Does that make sense? – bigmike7801 Jun 21 '11 at 14:42

1 Answers1

54

Why don't you separate out message relationships from user's view of each message?

I would do the threading by a self-referencing relationship on message. In other words, the message has a "responding_to_message_id" column.

I'm not sure I understand why you have a "to_id". Are messages directed to individual users? This seems very limited. I would think that you would either have no recipient (i.e. the recipient is a message board that anyone can read) or you would have the ability to specify multiple recipients, just like with an e-mail. Perhaps you can explain more about how the system is to be used.

Assuming (for simplicity) that you are posting to a board, so only the "from" is important, then you have your message table, with self-referencing relationship for threading, a user table, and then an intersection table between user and message that stores which messages have been read by each user.

That way, if you want to know if a user has read a message or not, just attempt to read the user ID in the intersection table for the given message. If it isn't there, then that message is unread by that user.

Note that if you want to have single recipients this design holds and if you want to have multiple recipients you can use an intersection table to hold the list of recipients for each message. If you do have a recipient intersection table, it can do double-duty as your read status table.

EDIT: ERD Sketch:

Here is a quick sketch of what I'm talking about...

ERD Sketch

Whether or not the sender has chosen to keep the message is flagged on the message itself. If the message is the start of a new thread, the reply_to_message_id column is NULL, otherwise it is the message_id of the parent message. There can be mulitple recipients, each of which have their own ability to keep the message or not, as well as the ability to track the date and time when the recipient reads the message.

EDIT 2: Alternate ERD and Querying for Most Recent Message

@OP asked how to query for the most recent message in a thread. The answer depends on the form of the thread. You can either have a flat thread where every message goes to the end of the linear stream of messages or you can have a tree-shaped thread where each message has a specific parent, unless it's the root of the thread. In the ERD above, the reply_to_message_id field could be used either way. If the thread is flat, then the FK is always to the root MESSAGE. If the thread is tree-shaped, then the FK is to the immediate parent of the reply MESSAGE.

If a typical query you want to run is "what is the most recent message in a thread?" and your threads are flat, then you can use SQL like this:

select top 1
  M.message_id
, M.sent_datetime
, M.title
, M.message_text
, S.user_id
, S.user_name
-- and anything else you want...
from MESSAGE M inner join USER S
  on M.sender_user_id = U.user_id
where M.reply_to_message_id = @ThreadRootMessageID
order by
  M.sent_datetime desc

If, on the other hand, your threads are tree-shaped and this is a query you want to be able to run quickly and easily, then the schema in the ERD above is not very easy to work with. SQL is not good at trees. You can solve the problem with a little bit of denormalization. See the ERD below:

Tree Thread ERD

Note that there is now one FK to show the immediate parent and one FK to show the root. Since threads aren't subject to editing - at least to edits where the root of a message is changed to point at a different thread, the denormalization that this entails does not imply risk of update anomallies so the redundancy is not too problematic.

If you use this ERD then the query for "most recent message in thread X" is the same as above, but with M.thread_root_message_id in the where clause instead of M.reply_to_message_id.

Joel Brown
  • 14,123
  • 4
  • 52
  • 64
  • Hi Joel, would you mind showing me how you would set up your tables, I'm not quite following how you would do it. Thanks! – bigmike7801 Jun 21 '11 at 14:43
  • I appreciate your contribution, however I don't see how that creates a threaded message. For example there can be 5 messages back and fourth for 1 thread. Does that make sense? – bigmike7801 Jun 21 '11 at 18:28
  • 1
    Message ID 1 has reply_to_message_id = null. Message ID 2 has reply_to_message_id = 1. Message ID 3 has reply_to_message_id = 2, and so forth. The thread is defined by each child message pointing to its parent. The first message in a thread doesn't have a parent. Every reply does. What goes in someone's inbox is either messages that point to them as senders (MESSAGE.sender_user_id) or what they have been sent (join through RECIPIENT). The sender_keep/recipient_keep is the "mark as read" feature. The relationship between messages is used for sorting them into threads visually. – Joel Brown Jun 21 '11 at 19:06
  • In your example, how would I run a query to select only the last message from each thread? – bigmike7801 Jun 22 '11 at 04:49
  • Wow! Very good and thorough answer! I'm going to go with a modified version of it. I'm going to use a thread_id to link the message table and recipient table together as well as a few other things but thanks for your detailed response! Also, you asked for a better explanation of what this will be used for and the answer is for private communication between 2 individuals. An initial message can be sent to multiple recipients, but a response can only be sent to the original author and no responses will not be able to be seen by others that the original post was sent to. – bigmike7801 Jun 23 '11 at 18:22
  • Thanks @JoelBrown - lots of very good data on this that has expanded my thought on how to properly design messaging databases... – Walker Farrow Apr 07 '16 at 21:27
  • 1
    @chovy Thread root is used as an arbitrary grouping attribute for everything that is on a thread without having to traverse the tree all the way up and all the way back down along every edge. You don't absolutely need it because you can derive this procedurally, but it can be handy to make the note "while you're in the neighbourhood" so to speak. – Joel Brown Mar 08 '21 at 22:49
  • @JoelBrown, thank you for providing such a detailed answer. I am new to database designs. Perhaps a dumb question: I would like to use this for 1:1 and group messages. Can I use the same design without modification? I can't figure out how to differentiate between 1:1 and group messages. – user1107173 Dec 05 '21 at 03:28
  • 1
    @user1107173 the models in my answer support 1:1 and 1:many messages where the individual recipients are defined. Messaging to a group is a bit trickier because a group is one address for many recipients. The models in my answer could be made to work for groups in a couple of ways. 1) you could just replace the group with the individual members of the group. This hides the fact that a group was used so it may not be ideal. 2) you could put both the group and all members in the recipient table with a code to indicate whether it's a group or a group member or an individual recipient. – Joel Brown Dec 05 '21 at 12:34