5

Let's say we have a Category - Items one-to-many relation. I would like to do this

SELECT c.*, 
   (SELECT COUNT(*) FROM items i WHERE i.catId=c.id)
    AS itemCount
FROM category c

And let's say we have a Hibernate POJO "class Category".

My first question is I really couldn't figure out that from that query I get a List<Category> object right? And how can I access the "itemCount"? Because there's no Category.getItemCount()

And secondly, how can I write the Criteria query?

Thanks

Seregwethrin
  • 1,319
  • 2
  • 13
  • 24

3 Answers3

8

Seems like this is the answer I was looking for (into to POJO):

@Formula(value="(SELECT COUNT(*) FROM Items i WHERE i.id = id)")
@Basic(fetch=FetchType.EAGER)
public Integer getItemCount() {
   return this.taskCount;
}
Seregwethrin
  • 1,319
  • 2
  • 13
  • 24
2

Depending on your circumstances and ability to create a view. I would just create a view out of your query:

CREATE VIEW CategoryItemsView AS 
    SELECT c.*,  
   (SELECT COUNT(*) FROM items i WHERE i.catId=c.id) 
    AS itemCount 
FROM category c 

afterwards you can query how ever you like...

SELECT * FROM CategoryItemsView WHERE ItemCount = 5

Additionally, you could use a GROUP BY to achieve a similar result but that depends on your columns and the schema of your tables.

So, something like this:

SELECT c.COLUMN1, c.COLUMN2, COUNT(*) AS ItemCount
FROM category c inner join items i on i.catID = c.Id
GROUP BY c.COLUMN1, c.COLUMN2
HAVING COUNT(*) = 2
Bob Delavan
  • 647
  • 5
  • 6
  • Creating a view crossed my mind but I really don't want to create a view for every count, max, min or any other aggregate function or subquery. If I'd make a one big view which includes any possible of those then that's a performance kill. Also I still doesn't understand how to access that ItemCount in your second example through Hibernate POJO files, since they don't contain ItemCount (but yes first solution will be containing the ItemCount, but that is as I said before seemed to be not a smart solution to me). – Seregwethrin Mar 24 '12 at 18:50
  • On a second thought, can I put an "int itemCount" property into POJO file and will it work properly even with the itemCount is not present in the query? (Without the subquery) – Seregwethrin Mar 24 '12 at 19:03
  • I just tried to map a View into Hibernate, and it was a disaster... Because views doesn't have keys, Hibernate Tools created 2 classes, CategoryView and CategoryViewId, which is also suggested pattern for this type of situations according to some articles I read. Now I have 3 class files, Category (the actual table), CategoryView and CategoryViewId. I definitely not go for this solution, if there's not a better system for views in Hibernate. – Seregwethrin Mar 24 '12 at 19:32
2

About the query

Inspired by @a_horse's comment I ran a quick test with a real life table for a similar purpose.

  • 80 categories
  • 6862 items
  • optimal indexes (on items.catID. There are more, but not useful here.)

Three candidates. Results are identical. Query plans and performance vary.

1) Subselect per category (Seregwethrin's original)

Total runtime: 20.351 ms

SELECT c.*
      ,(SELECT COUNT(*) FROM items i WHERE i.catid = c.id) AS item_ct
FROM category c

2) LEFT JOIN, then GROUP

Total runtime: 36.320 ms

SELECT c.*
      ,count(*) AS item_ct
FROM   category c
LEFT   JOIN items i ON i.catid = c.id
GROUP  BY c.catid;  -- prim. key of category

3) GROUP, then LEFT JOIN

Total runtime: 18.588 ms

SELECT c.*
      ,item_ct
FROM   category c
LEFT   JOIN  (
    SELECT catid
          ,count(*) AS item_ct
    FROM   items
    GROUP  BY catid
    ) i ON i.catid = c.id

So, my first suggestion wasn't any good. As expected (after some thought), version 3) performs best. It makes sense, too: If you count first, and then JOIN, fewer join operations are necessary.

The difference in performance will become more distinct for bigger tables, especially with more categories.

For 2) to work you need PostgreSQL 9.1 and category.id needs to be the primary key.
For older versions you would have to list all non-aggregated columns in the GROUP BY clause.

I switched to LEFT JOIN generally, because the original query includes categories without associated items.

The index on items.catID is only used by 1) where multiple subqueries can profit. In the other queries sequential scans are faster: all of both tables has to be read anyway.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    I wouldn't be surprised if the optimizer generated the same plan for your query and the original one. Note that you probably want a left join to get a `0` count for those categories without items (which is what the original statement is doing) –  Mar 24 '12 at 16:39
  • @a_horse_with_no_name: Yep, LEFT JOIN crossed my mind, too. I'll check if the planner optimizes ... – Erwin Brandstetter Mar 24 '12 at 16:42
  • Well, this answer really is not an answer for my question :) But anyway, is there an index for "id" and "catid" column of items table? – Seregwethrin Mar 24 '12 at 20:26
  • @Seregwethrin: Yeah, not exactly what you asked for. :) But may be useful for you, still. I added a bit on index use in the test case. – Erwin Brandstetter Mar 24 '12 at 20:45