1

I'm building a messaging feature in a Django 1.6.2 application using PostgreSQL 9.3.4. On a user's "Messages" home page, I'll show a list of conversations that a user has with other users. Each conversation "tile" or block will show the picture and name of the other user in that conversation, the date the last message in that conversation was sent, and the first 25 characters in that last message. I'll also show a small "reply" icon if the last message was sent by user who is viewing these conversations. I've got my query to the point where I can identify all of the conversations between the viewer and all the other users but I'm having trouble pulling in the fields I need from the User and Message tables.

My tables (shown at the bottom) are User, Message, and Conversation. Although I've implemented my table schemas so that there is a many-to-many relationship between users and conversations, in the beginning I'm going to create my interface so that a user can only send a message to one other user instead of multiple users.

When I run my query on the data shown below, what I'm trying to get back is the conversation and user IDs for users 3, 4, 5 along with their associated usernames, the last message in that conversation, who sent it, and the date it was sent. Instead, I'm getting the error:

ERROR: syntax error at or near "WHERE"

Can anyone help me fix this query? I'm more interested in speed than elegance.

Test case

Data in conversation_user linking table:

 id | conversation_id | user_id 
----+-----------------+---------
  1 |               1 |      32
  2 |               1 |       3   <- want this
  3 |               2 |      32
  4 |               2 |       4   <- want this
  6 |               3 |       3
  7 |               3 |       1
  8 |               4 |      32
  9 |               4 |       5   <- want this
 10 |               5 |       7
 11 |               5 |       9

Rows I want to return. Each message is last message in that conversation.

conversation_id | user_id | username  | from_user | message | send_date
----------------+---------+-----------+-----------+---------+----------
 1              | 3       | user3     | u3 or u32 | <msg3>  | <date>
 2              | 4       | user4     | u4 or u32 | <msg4>  | <date>
 4              | 5       | user5     | u5 or u32 | <msg5>  | <date>

Query that isn't working:

SELECT cu.conversation_id,
       cu.user_id,
       au.username,
       m.from_user,
       m.message,
       m.send_date
FROM conversation_user cu
INNER JOIN auth_user au ON cu.user_id = au.id
INNER JOIN message m ON cu.conversation_id = m.conversation_id
ORDER BY m.send_date DESC LIMIT 1
WHERE conversation_id IN
    (SELECT conversation_id
     FROM conversation_user
     WHERE user_id = 32)
  AND user_id != 32;

Table definitions

# auth_user
--------------+--------------------------+------------------------------
 id           | integer                  | not null default nextval(...
 username     | character varying(30)    | not null
Referenced by:
    TABLE "conversation_user" CONSTRAINT "conversation_user_user_id_fkey" FOREIGN KEY (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "message" CONSTRAINT "message_from_user_id_fkey" FOREIGN KEY (from_user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED

# conversation
------------+--------------------------+--------------------------------
 id         | integer                  | not null default nextval(...
 start_date | timestamp with time zone | not null
Referenced by:
    TABLE "conversation_user" CONSTRAINT "conversation_id_refs_id_4344ca71" FOREIGN KEY (conversation_id) REFERENCES conversation(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "message" CONSTRAINT "message_conversation_id_fkey" FOREIGN KEY (conversation_id) REFERENCES conversation(id) DEFERRABLE INITIALLY DEFERRED

# conversation_user
-----------------+---------+--------------------------------------------
 id              | integer | not null default nextval(...
 conversation_id | integer | not null
 user_id         | integer | not null
Foreign-key constraints:
    "conversation_id_refs_id_4344ca71" FOREIGN KEY (conversation_id) REFERENCES conversation(id) DEFERRABLE INITIALLY DEFERRED
    "conversation_user_user_id_fkey" FOREIGN KEY (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED

# message
     Column      |           Type           |
-----------------+--------------------------+---------------------------
 id              | integer                  | not null default nextval(...
 conversation_id | integer                  | not null
 from_user_id    | integer                  | not null
 to_user_uid     | integer                  | not null
 message         | text                     | not null
 send_date       | timestamp with time zone | not null
Foreign-key constraints:
    "message_conversation_id_fkey" FOREIGN KEY (conversation_id) REFERENCES conversation(id) DEFERRABLE INITIALLY DEFERRED
    "message_from_user_id_fkey" FOREIGN KEY (from_user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Jim
  • 13,430
  • 26
  • 104
  • 155
  • 1
    Move the `ORDER BY` to after the `WHERE` conditions: `WHERE conversation_id..... ORDER BY m.send_date DESC LIMIT 1` – Lamak Jan 14 '15 at 20:35
  • Why are you writing raw SQL rather than using the Django model layer? – Daniel Roseman Jan 14 '15 at 20:43
  • @DanielRoseman, I'm using raw SQL because although I'm not a SQL expert, I'm much more comfortable writing SQL than using Django's ORM commands. Also, I'd like to eliminate any possible performance overhead that might result from Django having to translate ORM commands to SQL. – Jim Jan 15 '15 at 00:12
  • @DanielRoseman: What would be the benefit of using the Django model layer? – Erwin Brandstetter Jan 15 '15 at 00:39
  • @Lamak, thanks for your input but I had to give credit to Erwin as he provided me with an enormous amount of help. – Jim Jan 15 '15 at 03:54
  • @DanielRoseman, I do use the ORM for simple queries and Django's models to establish the database's initial structure. But I wouldn't even begin to attempt to translate the solution query to Django's ORM. – Jim Jan 15 '15 at 03:56

1 Answers1

2

Fix syntax

Basically, you just need to move the WHERE condition to its proper place, like @Lamak commented:

SELECT  ...
FROM conversation_user cu
INNER JOIN ...
WHERE conversation_id IN
    (SELECT conversation_id
     FROM conversation_user
     WHERE user_id = 32)
AND user_id != 32
ORDER BY m.send_date DESC
LIMIT 1;

Make it fast

According to comment:

I'm trying to select the last message in each of the [...] conversations user 32 is having.

SELECT cu.conversation_id
     , ufrom.username AS from_user
     , uto.username   AS to_user
     , m.message
     , m.send_date
FROM   conversation_user cu
LEFT   JOIN LATERAL (
   SELECT from_user_id, to_user_id, message, send_date
   FROM   message   m
   WHERE  m.conversation_id = cu.conversation_id
   ORDER  BY send_date DESC
   LIMIT  1
   ) m ON TRUE
LEFT   JOIN auth_user ufrom ON ufrom.id = m.from_user_id
LEFT   JOIN auth_user uto   ON uto.id = m.to_user_id
WHERE  cu.user_id = 32;

Notes

DB design

  • The query assumes that (user_id, conversation_id) is UNIQUE - which you confirmed in the comment. Be sure to add an actual UNIQUE constraint, which provides the much needed index automatically.

  • An index on message on (conversation_id, send_date DESC) would help, too. Details:

  • Assuming auth_user.id is the PK, so it would be indexed.

  • message.to_user_uid is probably supposed to be to_user_id - like from_user_id.

  • You probably want to add another FK to stay consistent:

    "message_to_user_id_fkey" FOREIGN KEY (to_user_id) REFERENCES auth_user(id)
    

    Not sure why you think you need DEFERRABLE INITIALLY DEFERRED. If you don't know you need this, remove it. It's for special purposes and makes regular operations more expensive.

  • If only two users can take part in the same conversation, it would be more efficient to remove conversation_user altogether and add user1 and user2 or similar to conversation - unless there are more attributes for each combination of user/conversation. Potentially simplify message, too. You only need a boolean information instead of from_user and to_user.
    According to relational theory, conversation can be seen as a the implementation of many-to-many relationship between table auth_user and itself.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Erwin, thanks very much. This is _almost_ what I'm trying to do. I'm trying to select the last message in _each_ of the three conversations user32 is having (with users 3, 4, and 5). This is giving me only the last message in the last conversation between user32 and user5. I'm going to continue looking at this to see if I can fix it. Please let me know if you know what changes need to be made. Thanks again! – Jim Jan 15 '15 at 00:05
  • This [query](http://pastebin.com/iH7AKiTw) actually retrieves the desired rows; it just lacks the columns from the user and message tables. It's derived from an earlier [question](http://stackoverflow.com/questions/27761521/find-rows-that-have-same-value-in-one-column-and-other-values-in-another-column) you helped me with. – Jim Jan 15 '15 at 00:20
  • @Robert: Any number of users can take part of the same conversation? Each user only *one* time, though? A UNIQUE constraint would enforce that (and create the mentioned index automatically). – Erwin Brandstetter Jan 15 '15 at 00:43
  • Oh no, only two users can take part in each conversation. user32 and user3 can have a conversation and user32 and user4 can have a separate conversation but users 32, 3, and 4 can't all be in the same conversation. In each conversation, each of the two users can send as many messages as they want to the other user. I've just written a query that selects the message fields using the conversation_id so it almost appears that the query that selects the conversations based on user_id and yields the conversation_id needs to be a subquery of the message table query. I'm trying to figure that out. – Jim Jan 15 '15 at 01:05
  • Conceptually, it needs to work like [this query](http://pastebin.com/pXZhzdjc) except that this query won't execute. Subqueries can only select one column and my subquery selects four. – Jim Jan 15 '15 at 01:17
  • @Robert: I removed some outdated stuff after providing a new query. Some links, too, which I suggest to follow and read. – Erwin Brandstetter Jan 15 '15 at 01:35
  • There are a number of "u"s in your example (e.g. "ufrom ON ufrom.id"). Those are typos, I believe? – Jim Jan 15 '15 at 02:09
  • @Robert: Those are table aliases. [Details in the manual.](http://www.postgresql.org/docs/current/interactive/queries-table-expressions.html#QUERIES-TABLE-ALIASES) – Erwin Brandstetter Jan 15 '15 at 02:20
  • Oh, I see it now. But Postgres doesn't like the SELECT line 8. It says 'ERROR: column "from_user" does not exist" I'm working on it. – Jim Jan 15 '15 at 03:23
  • Erwin, it works!!! I also had to make that change in the JOIN clause further down and then also change 'to_user' to 'to_user_uid'. I _really_ appreciate your help. Clearly you've mastered SQL. Are there any books you could recommend I study to improve my SQL skills? I'm building a mobile web app so performance is important. Many articles I read say sites are slow because of their DB interactions. Therefore, I'll need to learn to write SQL that does the job and is highly performant. Thanks again! – Jim Jan 15 '15 at 03:52
  • @Robert: My favorite source is the [excellent manual](http://www.postgresql.org/docs/current/interactive/). It's like with any other skill: you can get far in a short while if you put some effort into it, but it takes a *long* time for mastery. Consider hiring an expert. – Erwin Brandstetter Jan 15 '15 at 04:06