2

I am trying to put together a report for the performance of one advert showing the views and clicks that it's had on one day. The views and clicks are stored in separate tables with different structures so I think I have to perform a union.

I have read and understand this fantastic piece. It has helped me but I think this is one level more complex than the examples explained there. Would like some help from the community.

This is my views table, which stores a counter of the amount of views that an advert has had on one day.

+-------------+--------------+
| COLUMN_NAME | COLUMN_TYPE  |
+-------------+--------------+
| ad_day_id   | bigint(13)   |
| advert_id   | bigint(20)   |
| date        | date         |
| views       | mediumint(6) |
+-------------+--------------+

This is my clicks table, which stores every click individually. (some columns left out as they are not relevant to the question)

+-------------+---------------------+
| COLUMN_NAME | COLUMN_TYPE         |
+-------------+---------------------+
| id          | bigint(20) unsigned |
| advert_id   | bigint(20)          |
| timestamp   | timestamp           |
+-------------+---------------------+

The result should look like (no real numbers used, just to show the format):

+------------+-------+--------+
| event_date | views | clicks |
+------------+-------+--------+
| 2016-05-09 |    25 |      4 |
| 2016-05-10 |     2 |        |
| 2016-05-11 |   105 |     10 |
| 2016-05-13 |    96 |      7 |
| 2016-05-14 |       |      1 |
+------------+-------+--------+

Regarding the results:

  • Not every date will have clicks or views
  • Some date may have views and no clicks
  • Some date may have clicks and no views

Off to the code... this is what I currently have:

SELECT
    $views_table.date AS event_date,
    $views_table.views,
    '' AS clicks
FROM
    $views_table
WHERE
    ( $views_table.date BETWEEN '$from_date' AND '$to_date' )
    AND $views_table.advert_id=$advert_id
UNION
SELECT
    CAST($clicks_table.timestamp AS DATE) AS event_date,
    '' AS views,
    COUNT($clicks_table.advert_id) AS clicks
FROM
    $clicks_table
WHERE
    ( CAST($clicks_table.timestamp AS DATE) BETWEEN '$from_date' AND '$to_date' )
    AND $clicks_table.advert_id=$advert_id
GROUP BY
    event_date
ORDER BY
    event_date ASC;

Some notes on the code:

  • The clicks are stored individually on timestamp and therefore must be cast to dates and then grouped by date (or at least that is how I got valid results for a different report).
  • The report will feature a date range and is for one specific advert. This is explains the where clauses.

While composing this question I formatted the code a bit better and for reading ease I changed the order of the select statements which resolved my initial problem. Apparently both selects must feature the same columns AND be in the same order.

I think I am nearly there because this is my current result:

+------------+-------+--------+
| event_date | views | clicks |
+------------+-------+--------+
| 2016-05-09 | 1     |        |
| 2016-05-09 |       | 1      |
| 2016-05-10 | 2     |        |
| 2016-05-11 | 105   |        |
| 2016-05-11 |       | 7      |
| 2016-05-13 | 96    |        |
| 2016-05-13 |       | 16     |
| 2016-05-14 | 2     |        |
| 2016-05-14 |       | 1      |
| 2016-05-15 | 2     |        |
| 2016-05-15 |       | 2      |
+------------+-------+--------+

My remaining problem is with the duplicate dates. How can I resolve this?
Many thanks to those who are kind enough to answer!

Community
  • 1
  • 1
FreshSnow
  • 394
  • 2
  • 12
  • "I think I am nearly there" - you are. Put your code in a subquery and use `GROUP BY event_date`. You will need to adjust the SELECT. – Paul Spiegel May 15 '16 at 11:12

2 Answers2

1

It is not a simple union you need, but a union and joins. So, you need a union to get a combined list of dates from both the views and and the clicks table. Then you need to left join the viewes and clicks tables on the list of dates:

select ds.event_date, max(v.views) views, count(c.clicks) clicks
from
    (select distinct date as event_date from views
     union distinct
     select distinct date(timestamp) from clicks) ds
left join views v on ds.event_date=v.date
left join clicks c on ds.event_date=date(c.timestamp)
where ...
group by ds.event_date
Shadow
  • 33,525
  • 10
  • 51
  • 64
  • You need to filter your subselects by `advert_id`. And (i'm not sure but) wouldn't `union distinct` make `select distinct` redundant? – Paul Spiegel May 15 '16 at 11:18
  • @PaulSpiegel I'm not sure either, therefore I kept the select distincts. It is probably redundant in the 1st query. – Shadow May 15 '16 at 11:48
  • [Doc](http://dev.mysql.com/doc/refman/5.7/en/union.html) is not very precise: "duplicate rows are removed from the result". But i've tested on [sqlfiddle](http://sqlfiddle.com/#!9/adbd9/7). However - if `views.date` isn't distinct already, you'll get wrong click count because of the double join. But that is OPs problem :-) – Paul Spiegel May 15 '16 at 12:09
  • Thanks Shadow and @PaulSpiegel, I have updated my code and after a number of tries I now get results. But the results are not correct. For the views it seems they come from random adverts (perhaps the first one to register a view for that day?) and the clicks are numbers I can't retrace. Can you please have a look at my updated code? I created this Pastie because the comments field is too small: http://pastie.org/private/uvobpjedjvgrciqlfg – FreshSnow May 15 '16 at 13:42
  • @Frank - Looks like you have mixed my suggestion from comments and Shadows solution. That will not work. Try one of them. Better - try both, but separately. – Paul Spiegel May 15 '16 at 13:51
  • Sorry about that... Trying my best here :) I have also tried to get everything from the subselect and that gives valid results for the views, but no clicks data: http://pastie.org/private/dnpuyjcnaa1can0wprjka – FreshSnow May 15 '16 at 13:56
  • This solution works fine with some modifications: http://sqlfiddle.com/#!9/796b8/1 – Paul Spiegel May 15 '16 at 14:33
1

I modified your query a little bit (see inline comments) and wrapped it in a subquery to use GROUP BY event_date in the outer query.

SELECT event_date, MAX(views) AS views, MAX(clicks) AS clicks
FROM (
    SELECT
        views.date AS event_date,
        views.views,
        0 AS clicks -- '' causes strange results on sqlfiddle
    FROM
        views
    WHERE
        ( views.date BETWEEN '2016-05-09' AND '2016-05-15' )
        AND views.advert_id=1
    UNION
    SELECT
        CAST(clicks.timestamp AS DATE) AS event_date,
        0 AS views, -- '' causes strange results on sqlfiddle
        COUNT(clicks.advert_id) AS clicks
    FROM
        clicks
    WHERE
        ( CAST(clicks.timestamp AS DATE) BETWEEN '2016-05-09' AND '2016-05-15' )
        AND clicks.advert_id=1
    GROUP BY
        event_date
    -- ORDER BY is useless here
) sub
GROUP BY event_date
ORDER BY event_date

Demo

Instead of CAST(clicks.timestamp AS DATE) you can also use DATE(clicks.timestamp) and hope that MySQL will use indexes in the future.

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
  • Thanks very much Paul, that works like a charm and I went with all your suggestions in my code. The DATE() command works fine. Woohoo, very happy!! – FreshSnow May 16 '16 at 11:14