0

I am trying to get the sum of the size of files contained in a directory for a group. However, the size is being multiplied due to the number of rows returned from the joins. I also have other tables contained in my query to get history information. I have a PostgreSQL database and I am using a hibernate query in my repository interface.

Online, it says I need to subquery my aggregation. However I need to join all these tables to get the files relating to the directory which relate to the user which relates to the group. How can I do a distinct sum?

@Query(value = "select a.username as name, sum(d.filesize) as size"
        + "from usergroup a "
        + "join a.nuser b "
        + "join b.directory c "
        + "join c.files d "
        + "join b.nuserhistory e"
        + "group by a.name"
        + "having count(e) > b.maxCount")

Relations between tables:

  • many usergroups to many nusers.
  • many directorys to one nuser.
  • many files to one directory.
  • many histories to one nuser.
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Sarah92
  • 671
  • 4
  • 12
  • 29
  • What's the line `join.b.history e`? Looks like a typo. Also, all join conditions are missing and `group` and `user` are reserved words that cannot be used as unquoted identifiers like this. Please provide valid syntax to work with. – Erwin Brandstetter Dec 03 '14 at 17:48
  • sorry I tried to simplify the syntax of my query to more clearly show the problem I am having, I have also had to change some model names. – Sarah92 Dec 03 '14 at 17:59
  • Join conditions are still missing. Those are essential for an answer. Also, one has to know your relational design (table definitions) to understand where the multiplication happens. Here is a related answer with explanation: http://stackoverflow.com/questions/12464037/two-sql-left-joins-produce-incorrect-result/12464135#12464135 – Erwin Brandstetter Dec 03 '14 at 18:34
  • Added join conditions. since this is a hibernate jpa query, the conditions are specified in the model and not in the query. – Sarah92 Dec 04 '14 at 07:51

1 Answers1

0

Your basic problem seems to be that the join to nuserhistory (n rows per 1 user) multiplies rows, which leads to incorrect values for count() and sum().

Aggregate (count) rows per user before you join in a subquery. Transform the HAVING condition into a WHERE condition accordingly. SQL could look something like this:

SELECT a.username AS name    -- "username" comes from table "usergroup"??
     , sum(c.filesize) AS size
FROM   usergroup   g
JOIN   a.nuser     a ON ???  -- join conditions unknown
JOIN   b.directory b ON ???
JOIN   c.files     c ON ???
JOIN  (
   SELECT user_id, count(*) AS ct
   FROM   b.nuserhistory
   GROUP  BY user_id       -- assuming user_id as foreign key column
   ) d ON d.user_id = a.id
WHERE  d.ct > a.maxCount   -- mixed case name without double quotes?
GROUP  BY a.username;      -- cannot be "a.name"

This is largely guesswork, since you did not disclose your relational model and the query you present still doesn't add up.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228