8

I'm about to throw in the towel with this.

Preface: I want to make this work with any N, but for the sake of simplicity, I'll set N to be 3.

I've got a query (MySQL, specifically) that needs to pull in data from a table and sort based on top 3 values from that table and after that fallback to other sort criteria.

So basically I've got something like this:

SELECT tbl.id 
FROM
  tbl1 AS maintable 
  LEFT JOIN 
  tbl2 AS othertable 
  ON
  maintable.id = othertable.id
ORDER BY 
  othertable.timestamp DESC, 
  maintable.timestamp DESC

Which is all basic textbook stuff. But the issue is I need the first ORDER BY clause to only get the three biggest values in othertable.timestamp and then fallback on maintable.timestamp.

Also, doing a LIMIT 3 subquery to othertable and join it is a no go as this needs to work with an arbitrary number of WHERE conditions applied to maintable.

I was almost able to make it work with a user variable based approach like this, but it fails since it doesn't take into account ordering, so it'll take the FIRST three othertable values it finds:

ORDER BY 
  (
    IF(othertable.timestamp IS NULL, 0, 
      IF(
        (@rank:=@rank+1) > 3, null, othertable.timestamp
      )
    )
  ) DESC

(with a @rank:=0 preceding the statement)

So... any tips on this? I'm losing my mind with the problem. Another parameter I have for this is that since I'm only altering an existing (vastly complicated) query, I can't do a wrapping outer query. Also, as noted, I'm on MySQL so any solutions using the ROW_NUMBER function are unfortunately out of reach.

Thanks to all in advance.

EDIT. Here's some sample data with timestamps dumbed down to simpler integers to illustrate what I need:

maintable

id      timestamp
1       100
2       200
3       300
4       400
5       500
6       600

othertable

id     timestamp
4      250
5      350
3      550
1      700

=>

1
3
5
6
4
2

And if for whatever reason we add WHERE NOT maintable.id = 5 to the query, here's what we should get:

1
3
4
6
2

...because now 4 is among the top 3 values in othertable referring to this set.

So as you see, the row with id 4 from othertable is not included in the ordering as it's the fourth in descending order of timestamp values, thus it falls back into getting ordered by the basic timestamp.

The real world need for this is this: I've got content in "maintable" and "othertable" is basically a marker for featured content with a timestamp of "featured date". I've got a view where I'm supposed to float the last 3 featured items to the top and the rest of the list just be a reverse chronologic list.

Tommi Forsström
  • 1,467
  • 11
  • 18
  • 2
    I'm having trouble getting the picture of this. Can you show sample data and the desired results? – Barmar Jul 29 '13 at 19:06
  • Why would the need to work with an arbitrary number of WHERE conditions applied to maintable prevent you from doing a LIMIT 3 subquery on othertable? –  Jul 29 '13 at 19:08
  • 1
    Nice blog post: http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ – Prasanth Jul 29 '13 at 19:08
  • Barmar, adding sample data to my post – Tommi Forsström Jul 29 '13 at 19:11
  • @MarkBannister, because if you limit the subquery, you'd end up getting top 3 of ANY rows. If you've limited the rows you're getting from maintable, you'd most likely end up with the wrong 3. – Tommi Forsström Jul 29 '13 at 19:11
  • Added some extra info to the post. – Tommi Forsström Jul 29 '13 at 19:18
  • If you include the `order by` on the timestamp (descending) in the subquery with the limit, it should only select the first three (by date) from othertable - you can then left join from the maintable (query) to the subquery. –  Jul 29 '13 at 19:29
  • While it's true with the assumption that the main query will include ALL the values from othertable, what if for whatever reason the main query was restricted with "WHERE NOT maintable.id = 5"? – Tommi Forsström Jul 29 '13 at 19:31
  • This is the first time you have mentioned the possibility that conditions on the maintable would affect the rows that could be returned from the **othertable**. –  Jul 29 '13 at 19:35
  • Mark, it's stated very clearly in the initial question. The assumption that the top 3 rows from othertable are all actually included in the result set is way too loose. – Tommi Forsström Jul 29 '13 at 19:37
  • No, it's explicitly stated that the conditions in the query apply to **maintable**; nowhere is it stated that this will affect the rows returned from **othertable**. Reread you own question and point out where it says otherwise. –  Jul 29 '13 at 19:40
  • Come on, the way joins work makes it implicit that if you restrict what's being joined to has an implication on what's being joined with. If I remove rows from what's selected from maintable, it should be fairly obvious that not all of othertable should be joined. – Tommi Forsström Jul 29 '13 at 19:42
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/34395/discussion-between-mark-bannister-and-tommi-forsstrom) –  Jul 29 '13 at 19:42

3 Answers3

1

Maybe something like this.

SELECT
  id
FROM
  (SELECT 
    tbl.id,
    CASE WHEN othertable.timestamp IS NULL THEN 
      0 
    ELSE 
      @i := @i + 1
    END AS num,
    othertable.timestamp as othertimestamp,
    maintable.timestamp as maintimestamp
  FROM
    tbl1 AS maintable
    CROSS JOIN (select @i := 0) i 
    LEFT JOIN tbl2 AS othertable 
      ON maintable.id = othertable.id
  ORDER BY
    othertable.timestamp DESC) t
ORDER BY
  CASE WHEN num > 0 AND num <= 3 THEN
    othertimestamp
  ELSE
    maintimestamp
  END DESC
GolezTrol
  • 114,394
  • 18
  • 182
  • 210
  • Unfortunately my scenario prevents an approach such as this: "Another parameter I have for this is that since I'm only altering an existing (vastly complicated) query, I can't do a wrapping outer query." – Tommi Forsström Jul 29 '13 at 19:19
1

Modified answer:

select ilv.* from
(select sq.*, @i:=@i+1 rn from
 (select @i := 0) i
  CROSS JOIN 
 (select m.*, o.id o_id, o.timestamp o_t
  from maintable m
  left join othertable o
  on m.id = o.id
  where 1=1
  order by o.timestamp desc) sq
) ilv
order by case when o_t is not null and rn <=3 then rn else 4 end,
         timestamp desc

SQLFiddle here.

Amend where 1=1 condition inside subquery sq to match required complex selection conditions, and add appropriate limit criteria after the final order by for paging requirements.

  • That will not work as was explicitly stated in the question: "Also, doing a LIMIT 3 subquery to othertable and join it is a no go as this needs to work with an arbitrary number of WHERE conditions applied to maintable." – Tommi Forsström Jul 29 '13 at 19:30
  • The amended version is now pretty similar to GolzTrol's, but I've made a couple of suggestions how it might be amended in line with the existing system. –  Jul 29 '13 at 20:34
  • I guess this is the only way to do it, but it still uses a wrapping query, so it unfortunately won't make my life easier in this context. I'll give you an upvote for effort, but I'll leave the correct answer unticked in the hopes that some magic fairy would arrive with an unwrapped solution. I may just be fighting a losing battle here. Thanks, though. – Tommi Forsström Jul 29 '13 at 21:14
  • @TommiForsström: The only alternative I can think of would be to use the main query to create a temporary table (like so: http://stackoverflow.com/questions/5859391/create-temporary-table-from-select-statement-without-using-create-table ) - so you could apply the appropriate conditions without using a wrapping query; *then*, join the temporary table to othertable, in a second query. However, that approach does use *two* queries. –  Jul 30 '13 at 06:07
  • Yeah, that did come to mind, but that might just be a somewhat terrible performance hazard. Also (kind of out of the scope of this discussion) that would make me assume that all the other modifiers to the query have been run already, meaning that the main query already would return all the rows it's going to return. Thanks anyways. – Tommi Forsström Jul 30 '13 at 15:32
1

Can you use a union query as below?

(SELECT id,timestamp,1 AS isFeatured FROM tbl2 ORDER BY timestamp DESC LIMIT 3)
UNION ALL
(SELECT id,timestamp,2 AS isFeatured FROM tbl1 WHERE NOT id in (SELECT id from tbl2 ORDER BY timestamp DESC LIMIT 3))
ORDER BY isFeatured,timestamp DESC

This might be somewhat redundant, but it is semantically closer to the question you are asking. This would also allow you to parameterize the number of featured results you want to return.

greg
  • 11
  • 1