2

I've been trying to get a handle on this, but no luck, so it'd be great if someone can help me.

I've got 2 tables. Items and Item Flags.

Item

id        name       levelid        containertypeid        roomid        userid      timestamp
--        ----       -------        ---------------        ------        ------      -----
1         Item 1     0              0                      3             5           1365072108
2         Item 2     0              0                      3             5           1365342108
3         Item 3     0              0                      3             5           1389072108
4         Item 4     0              0                      3             5           1365085108
5         Item 5     2              3                      3             5           1365962108
6         Item 6     1              3                      3             5           1365322108

Item_Flags

id        itemid     flagid        userid        timestamp
--        ----       -------       -------       ---------
1         1          1              5            1365072108
2         1          5              5            1365342108
3         1          2              5            1389072108
4         1          3              5            1365085108
5         2          4              5            1365762108
6         2          1              5            1365921087
7         2          3              5            1365252108
8         3          6              5            1365982108
9         3          7              5            1365312108
10        3          8              5            1365342108

The Goal

What I'm trying to do is get a list of all items, sorted by a descending timestamp. So, effectively, I want a list of items based on the latest timestamp from when we either updated the actual item or added a flag for the item. The application updates the timestamp each time a flag is added against a particular item or whenever the item name or level is updated. The problem is that I cannot run a GROUP BY if I want to get back the name, levelid, etc. fields because I have use them in an aggregate function, so the real value does not come back. If I don't group them, I obviously get multiple results for each item depending on how many flags it has. Here is my query:

SELECT 
                items.id,
                items.name,   
                items.levelid,
                items.containertypeid,
                items.roomid,
                items.userid
        FROM 
                items
        LEFT OUTER JOIN item_flags ON
                items.id = flags.itemid
        WHERE
                items.levelid = 0 AND 
                items.containertypeid = 0 AND 
                items.room = 3 AND
                items.userid = 5
        GROUP BY 
               items.id
        ORDER BY 
               items.timemodified DESC
               item_flags.timemodified DESC

I can get this to work if I wrap all of my select fields except items.id in aggregate functions and remove the ORDER by altogether, but I need to use those fields when I get them back and I need them ordered! I'm not even sure at this point if the logic in the query makes sense because whilst I am saying that the flags.itemid is equal to items.id, there is a one to many relationship there.

I've had a look at SQL Query - Grouping with multiple tables, but there isn't an ORDER BY there and I can't figure out how to make it work!

Using PG SQL.

Thanks!

==================================

As it turns out, at the end of it all, the query had to work with MySQL too and the With obviously does not work. So we've come up with a different solution which I'm running tests on will post as soon as I have, in case it might be useful for someone else.

Community
  • 1
  • 1
Meezaan-ud-Din
  • 1,213
  • 16
  • 21
  • It appears the only problem with your query is that you have a group by clause. Why do you think you need one? – Dan Bracuk Apr 18 '13 at 16:03
  • Because I only need 1 instance of an item returned because I'm just trying to figure out which items were last modified and depending on the limit passed to the query, get only a certain number of results. – Meezaan-ud-Din Apr 18 '13 at 18:51

3 Answers3

1

You can use the following GROUP BY

GROUP BY
            items.id,
            items.name,   
            items.levelid,
            items.containertypeid,
            items.roomid,
            items.userid

Since items.id is unique, this will group the records the same as "GROUP BY items.id".

This might seem like more work for the database, but it's up to the database to figure out that it can just check the id for the grouping.

Marlin Pierce
  • 9,931
  • 4
  • 30
  • 52
  • I also have to add the timestamp to the group by or this becomes an issue with the column not being defined in the group by or an aggregate function. I wish this worked, but it doesn't produced the right order if I add the timestamp to the GROUP BY. – Meezaan-ud-Din Apr 19 '13 at 11:05
  • OK, this works too as long as I add an items.timestamp in the GROUP BY too. – Meezaan-ud-Din Apr 19 '13 at 11:41
  • Fixed the amend and marked chosen answer for faster execution and a more concise even, even if I'm not clear on how the database is figuring the stuff out, but thanks! – Meezaan-ud-Din Apr 19 '13 at 12:00
  • Sorry, take it back, this won't give me the absolute latest timestamp of the entire recordset (this was a flaw in my original query too), but with Mark's answer I do get the max timestamp, which does the trick. – Meezaan-ud-Din Apr 19 '13 at 12:20
  • You could also group by the fields I listed, and add max(timestamp) to the select. Mark's answer will work, but might be misleading to the DB optimizer that the other fields might have multiple values and need a max. My way, the postgresql server should figure out it can use the primary key, and not worry about doing a max on name, levelid, etc. – Marlin Pierce Apr 19 '13 at 13:36
  • Got another solution Marlin, that works faster and is cleaner than the with or inline view. – Meezaan-ud-Din Apr 22 '13 at 13:28
1

One approach:

with cte as
(SELECT id,
        timestamp,
        name,   
        levelid,
        containertypeid,
        roomid,
        userid
 FROM items
 UNION ALL
 SELECT itemid id,
        timestamp,
        NULL as name,
        NULL as levelid,
        NULL as containertypeid,
        NULL as roomid,
        NULL as userid
 from item_flags)
select id,
       max(timestamp) as timestamp,
       max(name) as name,   
       max(levelid) as levelid,
       max(containertypeid) as containertypeid,
       max(roomid) as roomid,
       max(userid) as userid
from cte
group by id
order by 2 desc

SQLFiddle here.

Alternatively:

select id,
       max(timestamp) as timestamp,
       max(name) as name,   
       max(levelid) as levelid,
       max(containertypeid) as containertypeid,
       max(roomid) as roomid,
       max(userid) as userid
from 
(SELECT id,
        timestamp,
        name,   
        levelid,
        containertypeid,
        roomid,
        userid
 FROM items
 UNION ALL
 SELECT itemid id,
        timestamp,
        NULL as name,
        NULL as levelid,
        NULL as containertypeid,
        NULL as roomid,
        NULL as userid
 from item_flags) sq
group by id
order by 2 desc

SQLFiddle here.

  • I got this to work with some slight syntax amends, but it does not produce the desired result because it sorts stuff by max(timestamp) rather than timestamp and the max timestamp does not seem to be the largest timestamp from the grouped ids. This makes sense to me, because as I understand, we are saying order by the maximum timestamp from the unionized data for EACH id, whether that id has 5 or 10 results, we want to see the 1 with the maximum timestamp. No? – Meezaan-ud-Din Apr 19 '13 at 11:05
  • @AsifN: Sorry, I cannot understand your comment. You first say, "the max timestamp does not seem to be the largest timestamp from the grouped ids" - by definition, the max timestamp is the largest timestamp within the group. You then say, "for EACH id, whether that id has 5 or 10 results, we want to see the 1 with the maximum timestamp" - this is *exactly* what the query does. Please can you rephrase your last comment and/or amend your question to include the desired output from the provided sample data? –  Apr 19 '13 at 11:15
  • The results don't reflect what the query does. I'm checking my unit test with different data to see if the data is the problem or the query is - will report back shortly! – Meezaan-ud-Din Apr 19 '13 at 11:17
  • OK - it works. The query is fine, and now that I have tried, as is the one below, but yours is certainly more descriptive in terms of what is going on. The problem was my unit test - I was using usleep() to add a day when creating records! So it was my mistake - thanks! I will measure the performance of both before selecting the best answer, but thanks so much! – Meezaan-ud-Din Apr 19 '13 at 11:40
  • Whilst I get the correct id's ordered by the time, the name that comes back for each of those IDs is, well, incorrect. How would I get the correct name to come back? – Meezaan-ud-Din Apr 19 '13 at 12:28
  • @AsifN: The name that comes back should be correct, too - can you describe in what way it is "incorrect"? I have updated my answer to correct the syntax, and included a link to a SQLFiddle using the sample data in your question - it seems to be working correctly. –  Apr 19 '13 at 13:02
  • Again, it seems like it is a problem with my php code. Investigating - sorry and thanks for your patience! – Meezaan-ud-Din Apr 19 '13 at 13:31
  • As it turns out, at the end of it all, the query had to work with MySQL too (as this will be an add-on to an existing open source product) and the With obviously does not work. So we've come up with a different solution which I'm running tests on will post as soon as I have, in case it might be useful for someone else. – Meezaan-ud-Din Apr 20 '13 at 13:19
  • @AsifN: You can use the same approach without a CTE - just move the contents of the CTE into an inline view within the main query. I have updated my answer with such a query. –  Apr 20 '13 at 13:31
  • Awesome - I'll give this a shot too instead of comparing timestamps which also seems to work. Thanks. – Meezaan-ud-Din Apr 21 '13 at 15:31
  • OK, I think we've got a better working solution rather than creating an inline view I'm adding it as an answer below. The inline view works too! – Meezaan-ud-Din Apr 22 '13 at 13:28
-1

So in the end, we've settled for this:

SELECT 
items.id, 
max(items.name) as name,
max(item_flags.flagid) as flagid,
max(COALESCE(item_flags.timestamp, items.timestamp)) AS tm
FROM   
items
LEFT JOIN 
item_flags 
ON 
item_flags.itemid = items.id 
GROUP BY items.id 
ORDER BY tm DESC
Meezaan-ud-Din
  • 1,213
  • 16
  • 21
  • This doesn't actually match the requirements specified - "the latest timestamp from when we either updated the actual item or added a flag for the item". If the item has been updated *since* the last time a flag was added, this query will return the last time the flag was added, but from the requirement it should be the last time the item was updated - see eg. Item 3. Also, there's no `items.flagid` - presumably, that should be `max(item_flags.flagid) as flagid` ? –  Apr 22 '13 at 13:40
  • I've fixed the items.flagid to item_flags.flagid. It does match the requirement (I suppose it depends on how it is read). "The latest timestamp from when we either updated the actual item or added a flag for the item" to meant the latest of either of those 2, which is what this does. As far as the application is concerned, the item_flags and item are read and displayed in tandem, so if the there was a flag against the item after it was updated, for all practical purposes, we know it was updated. Sorry if it wasn't clear! – Meezaan-ud-Din Apr 22 '13 at 19:08
  • No, it doesn't, and you've misunderstood what I said. The issue is **not** "if the there was a flag against the item **after** it was updated", it's the other way around - if the item has been updated **since** the last time a flag was added. As I said before, look at the results for Item 3 in your test data - the timestamp on the Items table is 1389072108, which is larger than the largest timestamp value for the same item on the Item_Flags table (which is 1365982108). Your query returns 1365982108 - see http://sqlfiddle.com/#!2/425fe/4 . –  Apr 23 '13 at 05:27
  • Hi Mark, Yes you're right in that the timestamp value is not correct, but the order in which the results come back is still correct - whether the flag on item 3 is updated last or the item 3 itself was, it brings back the item with the largest timestamp on top. See http://sqlfiddle.com/#!2/68fa1/4. I still think it may be a matter of me not explaining myself properly, and if it is, I shall try and clarify. – Meezaan-ud-Din Apr 23 '13 at 09:58