Multiple joins on the same table multiply counts

Question

When I run multiple joins on the same table, the first join seems to be the only one to go through.

For example, I'll get results like this:

ID, NAME, 200, 200
ID, NAME, 150, 150
ID, NAME, 100, 100

Where obviously the ticket count is clearly not the same as the time entry count.

select 
    contact.aid aid,
    (contact.data ->> 'FirstName') || ' ' || (contact.data ->> 'LastName') username,
    count(ticket) tickets,
    count(time) entries
from caches contact
inner join caches ticket
    on ticket.name = 'Ticket' and (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid
inner join caches time
    on time.name = 'TimeEntry' and (time.data ->> 'TicketID')::numeric = ticket.aid
where 
    contact.name='Contact'
group by
    contact.aid,
    username
order by 
    tickets desc
;

I should be getting results like:

ID, NAME, 200, 421
ID, NAME, 150, 312
ID, NAME, 100, 152

Please provide sample data and desired results. The results look fine to an outsider. A db<>fiddle might help. — Gordon Linoff, Sep 23 '20 at 13:32

score 1 · Answer 1 · answered Sep 23 '20 at 13:42

Can you please see if this works for you? This is the method I use when different related record types are contained within a single table.

If there is an error for the first case column, then change the numeric casts to whatever type that aid is (likely int or bigint).


select case name
         when 'Contact' then aid
         when 'Ticket' then (data->>'CreatorResourceID')::numeric
         when 'TimeEntry' then (data->>'TicketID')::numeric
       end as aid,
       max (
         case 
           when name = 'Contact' 
             then concat(
                    data->>'FirstName',
                    ' ',
                    data->>'LastName'
                  )
           else null
         end
       ) as username, 
       count(*) filter (where name = 'Ticket') as tickets,
       count(*) filter (where name = 'TimeEntry') as entries
  from contact
 group by aid
 order by tickets desc;

Thank you! I'll try this out. – Sean Sanker Jr. Sep 26 '20 at 11:06 — Sean Sanker Jr., Sep 26 '20 at 11:06

Erwin Brandstetter · Accepted Answer · 2020-09-23T15:02:28.530

The principal problem is the same as here:

Two SQL LEFT JOINS produce incorrect result

A little more obscured in your case by nesting values in a jsonb column, but all the same.

Aggregate first, join later:

SELECT contact.aid
     , concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
     , sum(ticket.tickets) AS tickets
     , sum(ticket.entries) AS entries
FROM   caches AS contact
CROSS  JOIN LATERAL (
   SELECT count(*)::int AS tickets
        , sum(entry.entries)::int AS entries
   FROM   caches AS ticket
   CROSS  JOIN LATERAL (
      SELECT count(*)::int AS entries
      FROM   caches AS entry
      WHERE  entry.name = 'TimeEntry' 
      AND   (entry.data ->> 'TicketID')::numeric = ticket.aid
      ) AS entry  -- was: "time"
   WHERE  ticket.name = 'Ticket'
   AND   (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid  -- numeric?
   ) AS ticket
WHERE  contact.name = 'Contact'
GROUP  BY contact.aid, username
ORDER  BY ticket.tickets DESC;

Assuming that aid, or at least (aid, username) is unique in the base table, we don't need the outer aggregate at all:

SELECT contact.aid
     , concat_ws(' ', contact.data->>'FirstName', contact.data->>'LastName') AS username
     , ticket.tickets
     , ticket.entries
FROM   caches AS contact
CROSS  JOIN LATERAL (
   SELECT count(*)::int AS tickets
        , sum(entry.entries)::int AS entries
   FROM   caches AS ticket
   CROSS  JOIN LATERAL (
      SELECT count(*)::int AS entries
      FROM   caches AS entry
      WHERE  entry.name = 'TimeEntry' 
      AND   (entry.data ->> 'TicketID')::numeric = ticket.aid
      ) AS entry  -- was: "time"
   WHERE  ticket.name = 'Ticket'
   AND   (ticket.data ->> 'CreatorResourceID')::numeric = contact.aid  -- numeric?
   ) AS ticket
WHERE  contact.name = 'Contact'
ORDER  BY ticket.tickets DESC;

Not only does it avoid the primary error of multiplied counts, it also typically makes the query faster.

Multiple array_agg() calls in a single query

You have INNER JOIN in your original query, which should probably be LEFT JOIN ... ON true to avoid eliminating users with no valid entries. It's safe to converted it to a CROSS JOIN in my solution because each subquery level is guaranteed to return exactly one row (aggregate functions, and not GROUP BY). See:

About the LATERAL join:

What is the difference between LATERAL and a subquery in PostgreSQL?

Casting to integer (::int) in the subqueries is optional (and assuming that counts will never be out of integer range). It avoids escalating to numeric, which is more expensive to sum up.

Why concat_ws()? See:

How to concatenate columns in a Postgres SELECT?

And do data ->> 'TicketID' and data ->> 'CreatorResourceID' have to be numeric? Would seem like they should be integer.

Aside: Normalizing your data model (at least to some extent) would probably help your cause. Joining tables on data values nested in a jsonb column is comparatively expensive, and can typically be made much more efficient.

This is invaluable guidance. I appreciate all the effort you went to provide this information to me. I'll read up on the provided documentation. Thank you so much! — Sean Sanker Jr., Sep 26 '20 at 11:05

score 0 · Answer 3 · answered Sep 23 '20 at 13:33

0

I am guessing that you are joining along two separate dimensions and hence getting the wrong results.

If so, you can use count(distinct). This is a bit of a guess, but perhaps:

count(distinct ticket) as tickets,
count(distinct time) as entries

answered Sep 23 '20 at 13:33

Gordon Linoff

1,242,037
58
646
786

I believe this worked! I'm going to test a bit and read up on distinct selection. I appreciate your assistance. – Sean Sanker Jr. Sep 23 '20 at 13:42

Multiple joins on the same table multiply counts

3 Answers3