0

I have a table that keeps multiple different metrics in the same table. There is one "value" column for the value of the metric, and another column that identifies what the metric actually is. Then, I have an epoch timestamp column, a website column and a primary key ID column.

I am trying to create a MySQL view to show each metric as a separate column, and have each row represent the latest value for each metric per day (there could be multiple rows for one metric in one day, I want to then show only the value for the latest time for that day).

Edit: The value column is just the total at that given time, so none of these values should be summed/added together. For page_views, the value would be the total from the beginning of time up until now.

Current Table

+----+---------------+--------------+-----------------+-------+
| ID | TIMESTAMP     | WEBSITE      | METRIC_TYPE     | VALUE |
+----+---------------+--------------+-----------------+-------+
| 1  | 1546610400000 | example1.com | page_views      | 101   |
| 2  | 1546610401000 | example1.com | page_views      | 117   |
| 3  | 1546610400000 | example1.com | unique_visitors | 30    |
| 4  | 1546610401000 | example2.com | page_views      | 22    |
| 5  | 1546610401000 | example2.com | ad_referrals    | 8     |
| 6  | 1546681000300 | example1.com | page_views      | 206   |
+----+---------------+--------------+-----------------+-------+

Note for the above snippet: all timestamps are for 01/04/2019, except for row #6. I would like to create a SQL query to turn that table into the below view:

Desired View

+------------+--------------+------------+-----------------+--------------+
| DATE       | WEBSITE      | PAGE_VIEWS | UNIQUE_VISITORS | AD_REFERRALS |
+------------+--------------+------------+-----------------+--------------+
| 01/04/2019 | example1.com | 117        | 30              | NULL         |
| 01/04/2019 | example2.com | 22         | NULL            | 8            |
| 1/05/2019  | example1.com | 206        | NULL            | NULL         |
+------------+--------------+------------+-----------------+--------------+

I am aware of how to convert the timestamp into a date using DATE_FORMAT(FROM_UNIXTIME(floor(timestamp/1000)) and TIME(FROM_UNIXTIME(floor(timestamp/1000))).

What I need is to pivot these results, and only choose the latest record for each metric per day.

I have tried re-joining the table several times over, but am not close to the result I'm looking for (situation is with confidential data, so this data is for different metrics but in the same format).

zgillis
  • 175
  • 3
  • 14
  • If appropriate, consider handling issues of data display in application code. – Strawberry Feb 28 '19 at 18:00
  • I think the first value for the `page_views` column should be `218` instead of `117`. – Barbaros Özhan Feb 28 '19 at 18:57
  • Unfortunately, my task requires I use Tableau to visualize this data. Otherwise, I would be much happier writing an application to manipulate the data, and that would honestly be way simpler! – zgillis Feb 28 '19 at 19:37
  • @BarbarosÖzhan, those aren't separate counts, they're running totals, where one time is later than the other, so they wouldn't be added. – zgillis Feb 28 '19 at 19:38

2 Answers2

2

You can use conditional aggregation, but you need to filter down to the last row per day:

select date(FROM_UNIXTIME(floor(timestamp/1000)) as date, website,
       max(case when metric_type = 'page_views' then value end) as page_views,
       max(case when metric_type = 'unique_visitors' then value end) as unique_visitors,
       max(case when metric_type = 'ad_referrals' then value end) as ad_referrals
from t
where t.timestamp = (select max(t2.timestamp)
                     from t t2
                     where date(FROM_UNIXTIME(floor(t2.timestamp/1000)) = date(FROM_UNIXTIME(floor(t.timestamp/1000)) and
                           t2.website = t.website and
                           t2.metric_type = t.metric_type
                    )
group by date, website;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This is close to what I am looking for. I should have been more clear above, but the entries in the table are just logged statistics from the time they were logged. So for example: an entry for page_views at 1pm would not be added to one from 2pm. Since each is just the total at a given time. – zgillis Feb 28 '19 at 21:43
  • @zgillis . . . If you actually ran this, you would have seen that it doesn't actually sum anything. The `where` clause filters everything down to one metric per website per day (this assumes that the same metric doesn't come in with exactly the same timestamp). In any case, `max()` specifically addresses your concern, but `sum()` does the same thing. – Gordon Linoff Feb 28 '19 at 21:45
  • Thanks, I figured out what you mean. Works! – zgillis Feb 28 '19 at 21:58
1

You can use conditional aggregation as

select DATE( FROM_UNIXTIME( floor(timestamp/1000) ) ) as date, website,
       sum(case when metric_type = 'page_views' then value end) as page_views,
       sum(case when metric_type = 'unique_visitors' then value end) as unique_visitors,
       sum(case when metric_type = 'ad_referrals' then value end) as ad_referrals
  from yourTable
group by date, website;

+------------+--------------+------------+-----------------+--------------+
| DATE       | WEBSITE      | PAGE_VIEWS | UNIQUE_VISITORS | AD_REFERRALS |
+------------+--------------+------------+-----------------+--------------+
| 01/04/2019 | example1.com | 218        | 30              | NULL         |
| 01/04/2019 | example2.com | 22         | NULL            | 8            |
| 01/05/2019 | example1.com | 206        | NULL            | NULL         |
+------------+--------------+------------+-----------------+--------------+

Demo

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55