1

I'm not super experienced with sql in general, and I'm trying to accomplish a pretty specific task- I want to first run a query to get the ID's of all my units with the top number of hits, and then from that run again to get the messages and counts of all the types of hits for those IDs in a specific time period.For the first query, I have this:

SELECT entity, count(entity) as Count
from plugin_status_alerts
where entered BETWEEN now() - INTERVAL '14 days' AND now()
group by entity
order by count(entity) DESC
limit 10

which results in this return:

"38792";3
"39416";2
"37796";2
"39145";2
"37713";2
"37360";2
"37724";2
"39152";2
"39937";2
"39667";2

The idea is to then use that result set to then run another query that orders by entity and status_code. I tried something like this:

SELECT status_code, entity, COUNT(status_code) statusCount
FROM plugin_status_alerts
where updated BETWEEN now() - INTERVAL '14 days' AND now() AND entity IN 
(SELECT id.entity, count(id.entity) as Count
from plugin_status_alerts id
where id.updated BETWEEN now() - INTERVAL '14 days' AND now()
group by id.entity
order by count(id.entity) DESC
limit 10
)
GROUP BY status_code, entity

but I get the error

ERROR: subquery has too many columns

I'm not sure if this is the route I should be going, or if maybe I should be trying a self join- either way not sure how to correct for whats happening now.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • As always, your version of Postgres and the details of the table definition are relevant for the best solution (exactly what you get with `\d tbl` in `psql`). Also, is it on purpose that you filter on `entered` in the 1st query, but on `updated` in the 2nd? – Erwin Brandstetter Mar 18 '15 at 03:14

2 Answers2

1

Use a JOIN instead of IN (subquery). That's typically faster, and you can use additional values from the subquery if you need to (like the total count per entity):

SELECT entity, status_code, count(*) AS status_ct
FROM  (
   SELECT entity  -- not adding count since you don't use it, but you could
   FROM   plugin_status_alerts
   WHERE  entered BETWEEN now() - interval '14 days' AND now()
   GROUP  BY entitiy
   ORDER  BY count(*) DESC, entitiy  -- as tie breaker to get stable result
   LIMIT  10
   ) sub
JOIN   plugin_status_alerts USING (entity)
WHERE  updated BETWEEN now() - interval '14 days' AND now()
GROUP  BY 1, 2;

Notes

  • If you don't have future entries by design, you can simplify:

    WHERE  entered > now() - interval '14 days'
    
  • Since the subquery only returns a single column (entity), which is merged with the USING clause, column names are unambiguous and we don't need table qualification here.

  • LIMIT 10 after you sort by the count is likely to be ambiguous. Multiple rows can tie for the 10th row. Without additional items in ORDER BY, Postgres returns arbitrary picks, which may or may not be fine. But the result of the query can change between calls without any changes to the underlying data. Typically, that's not desirable and you should add columns or expressions to the list to break ties.

  • count(*) is a bit faster than count(status_code) and doing the same - unless status_code can be null, in which case you would get 0 as count for this row (count() never returns null) instead of the actual row count, which is either useless or actively wrong. Use count(*) either way here.

  • GROUP BY 1, 2 is just syntactical shorthand. Details:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Thanks a lot! Not only did it solve my problem, your explanations make a lot of sense and help me get a better handle on things in general. – user2150073 Mar 18 '15 at 16:40
0

When you plug your first query into the second and use it in the in clause you still return two columns when the in only wants one. Either do this:

SELECT status_code, entity, COUNT(status_code) statusCount
FROM plugin_status_alerts
where updated BETWEEN now() - INTERVAL '14 days' AND now() 
AND entity IN (
    SELECT id.entity
    from plugin_status_alerts id
    where id.updated BETWEEN now() - INTERVAL '14 days' AND now()
    group by id.entity
    order by count(id.entity) DESC
    limit 10
)
GROUP BY status_code, entity

Or use the first query as a derived table and join with it.

jpw
  • 44,361
  • 6
  • 66
  • 86