-1

This question is a follow-up to Select first rank if RANK()+1 overflows with JOINS

The scenario

I have a bunch of predefined messages that my Discord and Twitch bots send to dedicated channels every 30 minutes in round-robin only if X messages where sent by other users between each of the bot's messages. The round-robin is independant from one channel to another.

SQL tables

CREATE TABLE `loop_msg` (
  `msg_id` int(11) NOT NULL,
  `content` varchar(2000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `loop_msg` (`msg_id`, `content`) VALUES
(2, 'Content 2'),
(3, 'Content 3'),
(4, 'Content 4'),
(6, 'Content 6'),
(7, 'Content 7'),
(8, 'Content 8');

CREATE TABLE `loop_msg_status` (
  `channel_id` bigint(20) NOT NULL,
  `msg_id` int(11) NOT NULL DEFAULT 0,
  `inbetween` int(11) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `loop_msg_status` (`channel_id`, `msg_id`, `inbetween`) VALUES
(123456789012345671, 2, 10),
(123456789012345672, 4, 30),
(123456789012345673, 6, 10),
(123456789012345674, 6, 0),
(123456789012345675, 6, 15),
(123456789012345676, 8, 10);

ALTER TABLE `loop_msg`
  ADD PRIMARY KEY (`msg_id`);

ALTER TABLE `loop_msg_status`
  ADD PRIMARY KEY (`channel_id`);

ALTER TABLE `loop_msg`
  MODIFY `msg_id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=9;
COMMIT;

The current query

Now, I need every 30 minutes to get every channel_id with inbetween >= 10 from msg_loop_status, the next message's content from loop_msg and the next msg_id from loop_msg as well since after posting the new message, I will UPDATE that channel_id's row with the new msg_id and set inbetween to 0. More specifically, if we're at the end of loop_message (with the highest msg_id), we need to restart the round-robin from the beginning (the lowest msg_id).

Here is what I currently have working:

WITH cte AS (
        SELECT loop_msg.msg_id
             , channel_id
             , inbetween
             , COALESCE(
                           LEAD(loop_msg.msg_id)        OVER w1
                         , FIRST_VALUE(loop_msg.msg_id) OVER w1
                       ) AS msg_id_next
             , COALESCE(
                           LEAD(content)        OVER w1
                         , FIRST_VALUE(content) OVER w1
                       ) AS content_next
          FROM      loop_msg
          LEFT JOIN loop_msg_status
            ON loop_msg.msg_id = loop_msg_status.msg_id
          WINDOW w1 AS (ORDER BY loop_msg.msg_id)
     )
SELECT channel_id, content_next, msg_id, msg_id_next
  FROM cte
 WHERE channel_id IS NOT NULL AND inbetween >= 10
;

The Fiddle

The problem

The problem

When multiples channels are at the same place (same msg_id for different channel_id in loop_msg_status), the ordering doesn't work correctly and the values for next_content and next_msg_id are the current values.

Expectations

Back to the basics: I have 1 channel_id in loop_msg_status where I send content from loop_msg in roundrobin every 30 minutes if inbetween >= 10. I store the msg_id of the sent content in loop_msg_status for that channel, and 30 minutes later, I query again to see where I'm at and post the next content. For example take this row:

channel_id: 123456789012345672
msg_id: 4
inbetween: 30

When doing my SELECT in msg_loop_status, since inbetween >= 10, this channel_id should be selected with the next msg_id that comes after 4 and its corresponding content from msg_loop. Because there isn't a msg_id=5, the next one is actually 6. So the output would be like this:

channel_id: 123456789012345672
next_msg_id: 6
next_content: Content 6

Applying that same logic to the whole example, I'm expecting the following output (the colums such as the current msg_id and content are not needed but can be added for the sake of the example for readability). 5 of the 6 channels have inbetween >= 10, so for each of these channels I need their next_msg_id and next_content as such:

channel_id         | msg_id | next_msg_id | next_content
-----------------------------------------------
123456789012345671 | 2      | 3           | Content 3
123456789012345672 | 4      | 6           | Content 6
123456789012345673 | 6      | 7           | Content 7
123456789012345675 | 6      | 7           | Content 7
123456789012345676 | 8      | 2           | Content 2
dan
  • 3,439
  • 17
  • 54
  • 82
  • "the ordering doesn't work correctly" Where & how? Tables have no order, result sets do (per an outermost order by). PS Please [use text, not images/links, for text--including tables & ERDs](https://meta.stackoverflow.com/q/285551/3404097). PS For debugging a [mre] includes the least code you can give that is code that you show is OK extended by code that you show is not OK.--In your post. When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamentals.) – philipxy Feb 15 '22 at 02:17
  • Put what is needed to ask in your post, not just at a fiddle. The [mre] link says this. Also, you do not clearly explain the problem--as I just said--"where & how" are missing--what ordering & where & why would it give the result you expect from the buggy query? You are not explaining why you expect what you expect. So we can't address it. You get what you get because that's how the language is defined. Even if we wrote yet another presentation of the language semantics, we can't address your misconceptions without your reasoning. Include justification per documentation before asking us to. – philipxy Feb 15 '22 at 02:33
  • Please clarify via edits, not comments. PS Your comment roughly says what sort of inputs give wrong ordering. What I said was to explain why your code should give the right ordering. PS Presumably you misunderstand relevant aspects of ordering. And I already told you, tables have no order, result sets have order. Some inaccessible intermediate results have order, like the ordering used before LIMIT/TOP is applied after ORDER BY in a subquery, or in evaluating window functions. But the table they are used to make has no row order. – philipxy Feb 15 '22 at 02:40
  • You wrote the code expecting it to work in all cases, so explain why you thought it would. (A necessary part of any MRE.) Good luck. – philipxy Feb 15 '22 at 02:45
  • 2
    Please clarify what you mean by "The round-robin is independent from one channel to another." That will be the key to understanding the correct solution. If channels are independent, we will never see the next detail related to a different channel. – Jon Armstrong Feb 15 '22 at 05:05

1 Answers1

1

Updated to reflect your latest detail, where the next message id should be based on the unique / ordered list of message rows, without regard to the status table.

The fiddle

WITH msgs AS (
        SELECT msg_id, content
             , COALESCE(
                           LEAD(loop_msg.msg_id)        OVER w1
                         , FIRST_VALUE(loop_msg.msg_id) OVER w1
                       ) AS msg_id_next
             , COALESCE(
                           LEAD(content)        OVER w1
                         , FIRST_VALUE(content) OVER w1
                       ) AS content_next
          FROM      loop_msg
        WINDOW w1 AS (ORDER BY loop_msg.msg_id)
     )
   , cte AS (
        SELECT loop_msg.msg_id
             , channel_id
             , inbetween
             , msg_id_next
             , content_next
          FROM      msgs AS loop_msg
          LEFT JOIN loop_msg_status
            ON loop_msg.msg_id = loop_msg_status.msg_id
     )
SELECT channel_id, content_next, msg_id, msg_id_next
  FROM cte
 WHERE channel_id IS NOT NULL AND inbetween >= 10
;

The result:

The result

The outer join probably is no longer required:

WITH msgs AS (
        SELECT msg_id, content
             , COALESCE(
                           LEAD(loop_msg.msg_id)        OVER w1
                         , FIRST_VALUE(loop_msg.msg_id) OVER w1
                       ) AS msg_id_next
             , COALESCE(
                           LEAD(content)        OVER w1
                         , FIRST_VALUE(content) OVER w1
                       ) AS content_next
          FROM      loop_msg
        WINDOW w1 AS (ORDER BY loop_msg.msg_id)
     )
   , cte AS (
        SELECT loop_msg.msg_id
             , channel_id
             , inbetween
             , msg_id_next
             , content_next
          FROM msgs AS loop_msg
          JOIN loop_msg_status
            ON loop_msg.msg_id = loop_msg_status.msg_id
     )
SELECT channel_id, content_next, msg_id, msg_id_next
  FROM cte
 WHERE inbetween >= 10
;

and finally:

WITH msgs AS (
        SELECT msg_id, content
             , COALESCE(
                           LEAD(loop_msg.msg_id)        OVER w1
                         , FIRST_VALUE(loop_msg.msg_id) OVER w1
                       ) AS msg_id_next
             , COALESCE(
                           LEAD(content)        OVER w1
                         , FIRST_VALUE(content) OVER w1
                       ) AS content_next
          FROM      loop_msg
        WINDOW w1 AS (ORDER BY loop_msg.msg_id)
     )
SELECT channel_id, content_next, loop_msg.msg_id, msg_id_next
  FROM msgs AS loop_msg
  JOIN loop_msg_status
    ON loop_msg.msg_id = loop_msg_status.msg_id
 WHERE inbetween >= 10
;
Jon Armstrong
  • 4,654
  • 2
  • 12
  • 14
  • 1
    Tested it with a lot of data and it works perfectly. I especially like the simplicity of the last example. Now I guess I keep finding new problems but I noticed that if a message gets deleted from `loop_msg`, this query will simply ignore the channels with that `msg_id` it since `loop_msg.msg_id` doesn't exist anymore for `loop_msg_status.msg_id`. I will work around the problem and have my interface to manage messages handle this properly by changing the status to the previous `msg_id`. Thanks a lot for your help, we can consider the matter resolved. – dan Feb 15 '22 at 23:12