1

I have complex SQL query in PostgreSQL 9.4.4:

SELECT
  p.id,
  p.name,
  p.page_variant_id,
  p.variant_name,
  (
    SELECT COUNT(*) FROM page_views
    INNER JOIN unique_page_visits upv ON upv.id = page_views.unique_page_visit_id
    WHERE page_views.page_id = p.id AND upv.updated_at >= '2015-08-15' AND
          upv.updated_at <= '2015-08-22'
  ) as views_count,
  (
    SELECT COUNT(*) FROM unique_page_visits upv
    WHERE upv.page_id = p.id  AND upv.updated_at >= '2015-08-15' AND
          upv.updated_at <= '2015-08-22'
  ) as page_visits_count,
  (
    SELECT COUNT(*) FROM conversions
    INNER JOIN conversion_goals cg ON cg.id = conversions.conversion_goal_id
    INNER JOIN unique_page_visits upv ON upv.id = conversions.unique_page_visit_id
    WHERE cg.page_id = p.id  AND conversions.updated_at >= '2015-08-15' AND
          conversions.updated_at <= '2015-08-22' AND cg.name = 'popup'
  ) as conversions_count
FROM
  pages p
WHERE
  p.page_variant_id = '25'
ORDER BY
  p.id ASC

Sample result:

 id | name | page_variant_id | variant_name | views_count | page_visits_count | conversions_count 
----+------+-----------------+--------------+-------------+-------------------+-------------------
 73 | a    |              25 | Original     |           1 |                 1 |                 1
(1 row)

I don't know if this query is written in the best way, but it works.
Any improvements are welcome! - removing redundancy in SELECT sub-queries e.g.:

AND upv.updated_at >= '2015-08-15' AND upv.updated_at <= '2015-08-22'

The problem is that I will have to group results by day. Every single day must appear in the results, even if no rows were found for that day.

I could re-use this code (I slightly modified this; credit to Erwin Brandstetter):

SELECT *
FROM  (SELECT generate_series('2015-08-15'::date
                            , '2015-08-22'::date
                            , '1 day'::interval)::date) AS d(day)
LEFT   JOIN (
   SELECT date_trunc('month', date_col)::date AS day
        , count(*) AS some_count
   FROM   tbl
   WHERE  date_col >= '2007-12-01'::date
   AND    date_col <= '2008-12-06'::date
-- AND    ... more conditions
   GROUP  BY 1
   ) t USING (day)
ORDER  BY 1;

The main problem is that I will need to LEFT JOIN on the field created_at (cast to date) to the tables page_views, conversions and unique_page_visits, not on pages table from (main-query, not sub-query in SELECT area).

Pseudo-code:

SELECT * 
FROM
    (SELECT generate_series('2015-08-15'::date
                          , '2015-08-22'::date
                          , '1 day'::interval)::date) AS d(day)

LEFT JOIN (
  SELECT day_from_subquery_not_from_pages::data AS day
  -- other stuff to return proper results AND conditions
) t USING(day)   

Is that even possible?

OR maybe I will have to just split this one large query into sub-queries (I will have 3 then...) and then use UNION to join results? Then I can JOIN ON days from sub-queries ...

What is the best way to achieve this?

Community
  • 1
  • 1
nothing-special-here
  • 11,230
  • 13
  • 64
  • 94
  • I cleaned up my [referenced answer](http://stackoverflow.com/a/15733103/939860) a bit after your reminder. It's better to move set-returning functions like `generate_series()` to the `FROM` list. – Erwin Brandstetter Aug 20 '15 at 23:42
  • 1
    As always, the definition of underlying tables is essential. Exact data types and constraints are relevant to devise the best query. You could provide a test case in an sql fiddle ([random example from today](http://sqlfiddle.com/#!15/2556c/2)). – Erwin Brandstetter Aug 20 '15 at 23:48

1 Answers1

1

Guessing missing details, this query might be what you are looking for:

WITH p AS (
   SELECT '2015-08-15'::date AS a, '2015-08-22'::date AS z  -- enter bounds once
        , id, name, page_variant_id, variant_name
   FROM   pages
   WHERE  page_variant_id = '25'   -- enter ID once
   )
SELECT p.id, p.name, p.page_variant_id, p.variant_name
     , day, v.views_count, pv.page_visits_count, c.conversions_count
FROM   p
     , LATERAL (SELECT day::date FROM generate_series(p.a, p.z, interval '1 day') day) d
LEFT   JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS views_count
   FROM                      p
   JOIN   page_views         pv  ON pv.page_id = p.id
   JOIN   unique_page_visits upv ON upv.id = pv.unique_page_visit_id
   WHERE  upv.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) v USING (day)
LEFT JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS page_visits_count
   FROM                      p
   JOIN   unique_page_visits upv ON upv.page_id = p.id
   WHERE  upv.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) pv USING (day)
LEFT JOIN (
   SELECT upv.updated_at::date AS day, count(*) AS conversions_count
   FROM                      p
   JOIN   conversion_goals   cg  ON cg.page_id = p.id
   JOIN   conversions        c   ON c.conversion_goal_id = cg.id
   JOIN   unique_page_visits upv ON upv.id = c.unique_page_visit_id
   WHERE  cg.name = 'popup'
   AND    c.updated_at BETWEEN p.a AND p.z
   GROUP  BY 1
   ) c USING (day)
ORDER  BY day;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Wow, it works, you are a genius Erwin! :) I have to change SELECT clause to `SELECT p.id, p.name, p.page_variant_id, p.variant_name, views_count, page_visits_count, conversions_count, day` to get correct results: http://codepad.org/RujhA4l2 . If I were in Vienna, I would buy you a beer :D – nothing-special-here Aug 21 '15 at 16:28
  • @nothing-special-here: Ah, right, forgot to add the counts to the final `SELECT`. I'll have a virtual beer later, then. Cheers! – Erwin Brandstetter Aug 21 '15 at 16:56