-1

I have a table that stores messages sent to users, the layout is as follows

id (auto-incrementing) | message_id | user_id | datetime_sent

I'm trying to find the first N message_id's that each user has received, but am completely stuck. I can do it easily on a per-user basis (when defining the user ID in the query), but not for all users.

Things to note:

  • Many users can get the same message_id
  • Message ID's aren't sent sequentially (i.e. we can send message 400 before message 200)
  • This is a read only mySQL database

EDIT: On second thought I removed this bit but have added it back in since someone was kind enough to work on it

The end goal is to see what % of users opened one of the first N messages they received.

That table of opens looks like this:

user_id | message_id | datetime_opened
Brandon
  • 3
  • 4
  • Why not create a datetime column with message date inserted in db? – Santiago Jul 14 '16 at 20:19
  • @Santiago the column for the time_sent is actually a datetime, not just time. Edited post to clarify. Also, as mentioned in the notes, it's a read only DB. – Brandon Jul 14 '16 at 20:22
  • Are you asking for percentage of *all* users, or only those that received at least one message? – trincot Jul 14 '16 at 20:26
  • @trincot Only those who received at least one message – Brandon Jul 14 '16 at 20:29
  • 1
    Why did you remove the second table from the question? I have been working with that, but now it seems your question changed!? No more requirement that message must be among first 5 read, etc? Wow, I just wasted my time on this :-( – trincot Jul 14 '16 at 20:48
  • @trincot I was worreid it made the scope too large. Post your answer please! I'll add it back in – Brandon Jul 14 '16 at 20:55

2 Answers2

0

This is an untested answer to the original question (with 2 tables and condition on first 5):

SELECT DISTINCT user_id
FROM   (   
            SELECT     om.user_id,
                       om.message_id,
                       count(DISTINCT sm2.message_id) messages_before
            FROM       opened_messages om
            INNER JOIN sent_messages sm
                   ON  om.user_id = sm.user_id
                  AND  om.message_id = sm.message_id
            LEFT JOIN  sent_messages sm2
                   ON  om.user_id = sm2.user_id
                  AND  sm2.datetime_sent < sm.datetime_sent
            GROUP BY   om.user_id,
                       om.message_id
            HAVING     messages_before < 5
        ) AS base

The subquery joins in sm2 to count the number of preceding messages that were sent to the same user, and then the having clause makes sure that there are fewer than 5 earlier messages sent. As for the same user there might be multiple messages (up to 5) with that condition, the outer query only lists the unique users that comply to the condition.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • That is awesome! Pretty much exactly what I needed. Thank you. Can't wait to study your answer further. – Brandon Jul 14 '16 at 21:07
0

To get the first N (here 2) messages, try

SELECT
  user_id
  , message_id
FROM (
  SELECT
    user_id
    , message_id
    , id
    , (CASE WHEN @user_id != user_id THEN @rank := 1 ELSE @rank := @rank + 1 END) AS rank,
      (CASE WHEN @user_id != user_id THEN @user_id := user_id ELSE @user_id END) AS _
  FROM (SELECT * FROM MessageSent ORDER BY user_id, id) T
  JOIN (SELECT @cnt := 0) c
  JOIN (SELECT @user_id := 0) u
) R
WHERE rank < 3
ORDER BY user_id, id
;

which uses a RANK substitute, derived from @Seaux response to Does mysql have the equivalent of Oracle's “analytic functions”?

To extend this to your original question, just add the appropriate calculation:

SELECT
  COUNT(DISTINCT MO.user_id) * 100 /
   (SELECT COUNT(DISTINCT user_id)
    FROM (
      SELECT
        user_id
        , message_id
        , id
        , (CASE WHEN @user_id != user_id THEN @rank := 1 ELSE @rank := @rank + 1 END) AS rank,
          (CASE WHEN @user_id != user_id THEN @user_id := user_id ELSE @user_id END) AS _
      FROM (SELECT * FROM MessageSent ORDER BY user_id, id) T
      JOIN (SELECT @cnt := 0) c
      JOIN (SELECT @user_id := 0) u
    ) R2
    WHERE rank < 3
   ) AS percentage_who_read_one_of_the_first_messages
FROM MessageOpened MO
JOIN
   (SELECT
      user_id
      , message_id
    FROM (
      SELECT
        user_id
        , message_id
        , id
        , (CASE WHEN @user_id != user_id THEN @rank := 1 ELSE @rank := @rank + 1 END) AS rank,
          (CASE WHEN @user_id != user_id THEN @user_id := user_id ELSE @user_id END) AS _
      FROM (SELECT * FROM MessageSent ORDER BY user_id, id) T
      JOIN (SELECT @cnt := 0) c
      JOIN (SELECT @user_id := 0) u
    ) R
    WHERE rank < 3) MR
  ON MO.user_id = MR.user_id
  AND MO.message_id = MR.message_id
;

With no CTEs in MySQL, and being in a read-only database - I see no way around having the above query twice in the statement.

See it in action: SQL Fiddle.

Please comment if and as this requires adjustment / further detail.

Community
  • 1
  • 1
Abecee
  • 2,365
  • 2
  • 12
  • 20