1

I have a view like this:

 Year | Month | Week | Category | Value |
 2017 | 1     | 1    | A     | 1
 2017 | 1     | 1    | B     | 2
 2017 | 1     | 1    | C     | 3
 2017 | 1     | 2    | A     | 4
 2017 | 1     | 2    | B     | 5
 2017 | 1     | 2    | C     | 6
 2017 | 1     | 3    | A     | 7
 2017 | 1     | 3    | B     | 8
 2017 | 1     | 3    | C     | 9
 2017 | 1     | 4    | A     | 10
 2017 | 1     | 4    | B     | 11
 2017 | 1     | 4    | C     | 12
 2017 | 2     | 5    | A     | 1
 2017 | 2     | 5    | B     | 2
 2017 | 2     | 5    | C     | 3
 2017 | 2     | 6    | A     | 4
 2017 | 2     | 6    | B     | 5
 2017 | 2     | 6    | C     | 6
 2017 | 2     | 7    | A     | 7
 2017 | 2     | 7    | B     | 8
 2017 | 2     | 7    | C     | 9
 2017 | 2     | 8    | A     | 10
 2017 | 2     | 8    | B     | 11
 2017 | 2     | 8    | C     | 12

And I need to make a new view which needs to show average of value column (let's call it avg_val) and the value from the max week of the month (max_val_of_month). Ex: max week of january is 4, so the value of category A is 10. Or something like this to be clear:

 Year | Month | Category | avg_val | max_val_of_month
 2017 | 1     | A        | 5.5     | 10
 2017 | 1     | B        | 6.5     | 11
 2017 | 1     | C        | 7.5     | 12
 2017 | 2     | A        | 5.5     | 10
 2017 | 2     | B        | 6.5     | 11
 2017 | 2     | C        | 7.5     | 12

I have use window function, over partition by year, month, category to get the avg value. But how can I get the value of the max week of each month?

Rakaziwi
  • 171
  • 4
  • 16
  • How is it `avg_val` is 1 in the desired output? – peterm Jun 02 '17 at 04:13
  • oh, that's not the real result, i am just lazy to calculate the real average result. the average value is result from calculation average of weeks in the month. ex: january -> avg(week 1, 2, 3, 4); february -> avg(week 5, 6, 7, 8) – Rakaziwi Jun 02 '17 at 06:27

4 Answers4

2

Assuming that you need a month average and a value for the max week not the max value per month

SELECT year, month, category, avg_val, value max_week_val
  FROM (
    SELECT *,
           AVG(value) OVER (PARTITION BY year, month, category) avg_val,
           ROW_NUMBER() OVER (PARTITION BY year, month, category ORDER BY week DESC) rn
      FROM view1
  ) q
 WHERE rn = 1
 ORDER BY year, month, category

or more verbose version without window functions

SELECT q.year, q.month, q.category, q.avg_val, v.value max_week_val
  FROM (
    SELECT year, month, category, avg(value) avg_val, MAX(week) max_week
      FROM view1
     GROUP BY year, month, category
  ) q JOIN view1 v
    ON q.year = v.year
   AND q.month = v.month
   AND q.category = v.category
   AND q.max_week = v.week
 ORDER BY year, month, category

Here is a dbfiddle demo for both queries

peterm
  • 91,357
  • 15
  • 148
  • 157
0
with data (yr, mnth, wk, cat, val) as
(
  -- begin test data
  select  2017 , 1     , 1    , 'A'     , 1 from dual union all
  select  2017 , 1     , 1    , 'B'     , 2 from dual union all
  select  2017 , 1     , 1    , 'C'     , 3 from dual union all
  select  2017 , 1     , 2    , 'A'     , 4 from dual union all
  select  2017 , 1     , 2    , 'B'     , 5 from dual union all
  select  2017 , 1     , 2    , 'C'     , 6 from dual union all
  select  2017 , 1     , 3    , 'A'     , 7 from dual union all
  select  2017 , 1     , 3    , 'B'     , 8 from dual union all
  select  2017 , 1     , 3    , 'C'     , 9 from dual union all
  select  2017 , 1     , 4    , 'A'     , 10 from dual union all
  select  2017 , 1     , 4    , 'B'     , 11 from dual union all
  select  2017 , 1     , 4    , 'C'     , 12 from dual union all
  select  2017 , 2     , 5    , 'A'     , 1 from dual union all
  select  2017 , 2     , 5    , 'B'     , 2 from dual union all
  select  2017 , 2     , 5    , 'C'     , 3 from dual union all
  select  2017 , 2     , 6    , 'A'     , 4 from dual union all
  select  2017 , 2     , 6    , 'B'     , 5 from dual union all
  select  2017 , 2     , 6    , 'C'     , 6 from dual union all
  select  2017 , 2     , 7    , 'A'     , 7 from dual union all
  select  2017 , 2     , 8    , 'A'     , 10 from dual union all
  select  2017 , 2     , 8    , 'B'     , 11 from dual union all
  select  2017 , 2     , 7    , 'B'     , 8 from dual union all
  select  2017 , 2     , 7    , 'C'     , 9 from dual union all
  select  2018 , 2     , 7    , 'C'     , 9 from dual union all
  select  2017 , 2     , 8    , 'C'     , 12 from dual
  -- end test data
)
select * from 
(
  select
    -- data.*: all columns of the data table
    data.*, 
    -- avrg: partition by a combination of year,month and category to work out -
    --       the avg for each category in a month of a year
    avg(val) over (partition by yr, mnth, cat) avrg, 
    -- mwk: partition by year and month to work out -
    --      the max week of a month in a year
    max(wk) over (partition by yr, mnth) mwk 
  from 
    data
)
-- as OP's interest is in the max week of each month of a year, -
-- "wk" column value is matched against 
--      the derived column "mwk"
where wk = mwk 
order by yr,mnth,cat;
Slkrasnodar
  • 824
  • 6
  • 10
  • Thank you for this code snippet, which may provide some immediate help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its educational value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with similar, but not identical, questions. Please [edit] your answer to add explanation, and give an indication of what limitations and assumptions apply. – Toby Speight Jun 02 '17 at 08:31
  • @TobySpeight, thanks for the positive note. Point well taken. – Slkrasnodar Jun 02 '17 at 11:14
0

And here is my NEW version.

My thanks to @peterm for pointing me about the prior false value of val_from_max_week_of_month. So, I corrected it:

SELECT 
    a.Year,
    a.Month,
    a.Category,
    max(a.Week) AS max_week,
    AVG(a.Value) AS avg_val,
    (
        SELECT b.Value 
        FROM decades AS b
        WHERE
            b.Year = a.Year AND
            b.Month = a.Month AND  
            b.Week = max(a.Week) AND 
            b.Category = a.Category
    ) AS val_from_max_week_of_month
FROM decades AS a
GROUP BY 
    a.Year,
    a.Month,
    a.Category
;

The new results:

enter image description here

0

First, you might need to check, how do you handle the first week in January. If 1st of January are not a Monday, there are several interpretations & not every one of them will fit the solutions here. You'll either need to use:

  • the ISO week concept, ie. the week column should hold the ISO week & the year column should hold the ISO year (week-year, rather). Note: in this concept, 1st of January actually sometimes belongs to the previous year
  • use your own concept, where the first week of the year is "split" into two if 1st of January is not a Monday.

Note: the solutions below will not work if (in your table) the first week of January can be 52 or 53.

Given that avg_val is just a simple aggregation, while max_val_of_month can be calculated with typical queries. It has a lot of possible solutions in PostgreSQL, with varying performance. Fortunately, your query will naturally have an easily determined selectivity: you'll always need (approx.) a quarter of your data.

Usual winners (in performance) are:

(These are not surprise though, as these 2 should perform more and more as you need more portion of the original data.)

array_agg() with order by variant:

select   year, month, category, avg(value) avg_val,
         (array_agg(value order by week desc))[1] max_val_of_month
from     table_name
group by year, month, category;

distinct on variant:

select   distinct on (year, month, category) year, month, category,
         avg(value) over (partition by year, month, category) avg_val,
         value max_val_of_month
from     table_name
order by year, month, category, week desc;

The pure window function variant is not that bad either:

row_number() variant:

select year, month, category, avg_val, max_val_of_month
from   (select year, month, category, value max_val_of_month,
               avg(value) over (partition by year, month, category) avg_val,
               row_number() over (partition by year, month, category order by week desc) rn
        from   table_name) w
where  rn = 1;

But the LATERAL variant is only viable with an index:

LATERAL variant:

create index idx_table_name_year_month_category_week_desc
  on table_name(year, month, category, week desc);

select     year, month, category,
           avg(value) avg_val,
           max_val_of_month
from       table_name t
cross join lateral (select   value max_val_of_month
                    from     table_name
                    where    (year, month, category) = (t.year, t.month, t.category)
                    order by week desc
                    limit    1) m
group by   year, month, category, max_val_of_month;

But most of the solutions above can actually utilize this index, not just this last one.

Without the index: http://rextester.com/WNEL86809
With the index: http://rextester.com/TYUA52054

pozs
  • 34,608
  • 5
  • 57
  • 63
  • cool! thanks for sharing the info the difference between table with index and table without index. it will help the performance when i use it to bigger data later! – Rakaziwi Jun 03 '17 at 04:03