Postgresql - How to get value from last record of each month

Question

I have a view like this:

 Year | Month | Week | Category | Value |
 2017 | 1     | 1    | A     | 1
 2017 | 1     | 1    | B     | 2
 2017 | 1     | 1    | C     | 3
 2017 | 1     | 2    | A     | 4
 2017 | 1     | 2    | B     | 5
 2017 | 1     | 2    | C     | 6
 2017 | 1     | 3    | A     | 7
 2017 | 1     | 3    | B     | 8
 2017 | 1     | 3    | C     | 9
 2017 | 1     | 4    | A     | 10
 2017 | 1     | 4    | B     | 11
 2017 | 1     | 4    | C     | 12
 2017 | 2     | 5    | A     | 1
 2017 | 2     | 5    | B     | 2
 2017 | 2     | 5    | C     | 3
 2017 | 2     | 6    | A     | 4
 2017 | 2     | 6    | B     | 5
 2017 | 2     | 6    | C     | 6
 2017 | 2     | 7    | A     | 7
 2017 | 2     | 7    | B     | 8
 2017 | 2     | 7    | C     | 9
 2017 | 2     | 8    | A     | 10
 2017 | 2     | 8    | B     | 11
 2017 | 2     | 8    | C     | 12

And I need to make a new view which needs to show average of value column (let's call it avg_val) and the value from the max week of the month (max_val_of_month). Ex: max week of january is 4, so the value of category A is 10. Or something like this to be clear:

 Year | Month | Category | avg_val | max_val_of_month
 2017 | 1     | A        | 5.5     | 10
 2017 | 1     | B        | 6.5     | 11
 2017 | 1     | C        | 7.5     | 12
 2017 | 2     | A        | 5.5     | 10
 2017 | 2     | B        | 6.5     | 11
 2017 | 2     | C        | 7.5     | 12

I have use window function, over partition by year, month, category to get the avg value. But how can I get the value of the max week of each month?

oh, that's not the real result, i am just lazy to calculate the real average result. the average value is result from calculation average of weeks in the month. ex: january -> avg(week 1, 2, 3, 4); february -> avg(week 5, 6, 7, 8) — Rakaziwi, Jun 02 '17 at 06:27

score 2 · Accepted Answer · answered Jun 02 '17 at 05:06

Assuming that you need a month average and a value for the max week not the max value per month

SELECT year, month, category, avg_val, value max_week_val
  FROM (
    SELECT *,
           AVG(value) OVER (PARTITION BY year, month, category) avg_val,
           ROW_NUMBER() OVER (PARTITION BY year, month, category ORDER BY week DESC) rn
      FROM view1
  ) q
 WHERE rn = 1
 ORDER BY year, month, category

or more verbose version without window functions

SELECT q.year, q.month, q.category, q.avg_val, v.value max_week_val
  FROM (
    SELECT year, month, category, avg(value) avg_val, MAX(week) max_week
      FROM view1
     GROUP BY year, month, category
  ) q JOIN view1 v
    ON q.year = v.year
   AND q.month = v.month
   AND q.category = v.category
   AND q.max_week = v.week
 ORDER BY year, month, category

Here is a dbfiddle demo for both queries

Slkrasnodar · Answer 2 · 2017-06-02T11:13:36.277

with data (yr, mnth, wk, cat, val) as
(
  -- begin test data
  select  2017 , 1     , 1    , 'A'     , 1 from dual union all
  select  2017 , 1     , 1    , 'B'     , 2 from dual union all
  select  2017 , 1     , 1    , 'C'     , 3 from dual union all
  select  2017 , 1     , 2    , 'A'     , 4 from dual union all
  select  2017 , 1     , 2    , 'B'     , 5 from dual union all
  select  2017 , 1     , 2    , 'C'     , 6 from dual union all
  select  2017 , 1     , 3    , 'A'     , 7 from dual union all
  select  2017 , 1     , 3    , 'B'     , 8 from dual union all
  select  2017 , 1     , 3    , 'C'     , 9 from dual union all
  select  2017 , 1     , 4    , 'A'     , 10 from dual union all
  select  2017 , 1     , 4    , 'B'     , 11 from dual union all
  select  2017 , 1     , 4    , 'C'     , 12 from dual union all
  select  2017 , 2     , 5    , 'A'     , 1 from dual union all
  select  2017 , 2     , 5    , 'B'     , 2 from dual union all
  select  2017 , 2     , 5    , 'C'     , 3 from dual union all
  select  2017 , 2     , 6    , 'A'     , 4 from dual union all
  select  2017 , 2     , 6    , 'B'     , 5 from dual union all
  select  2017 , 2     , 6    , 'C'     , 6 from dual union all
  select  2017 , 2     , 7    , 'A'     , 7 from dual union all
  select  2017 , 2     , 8    , 'A'     , 10 from dual union all
  select  2017 , 2     , 8    , 'B'     , 11 from dual union all
  select  2017 , 2     , 7    , 'B'     , 8 from dual union all
  select  2017 , 2     , 7    , 'C'     , 9 from dual union all
  select  2018 , 2     , 7    , 'C'     , 9 from dual union all
  select  2017 , 2     , 8    , 'C'     , 12 from dual
  -- end test data
)
select * from 
(
  select
    -- data.*: all columns of the data table
    data.*, 
    -- avrg: partition by a combination of year,month and category to work out -
    --       the avg for each category in a month of a year
    avg(val) over (partition by yr, mnth, cat) avrg, 
    -- mwk: partition by year and month to work out -
    --      the max week of a month in a year
    max(wk) over (partition by yr, mnth) mwk 
  from 
    data
)
-- as OP's interest is in the max week of each month of a year, -
-- "wk" column value is matched against 
--      the derived column "mwk"
where wk = mwk 
order by yr,mnth,cat;

Thank you for this code snippet, which may provide some immediate help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its educational value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with similar, but not identical, questions. Please [edit] your answer to add explanation, and give an indication of what limitations and assumptions apply. — Toby Speight, Jun 02 '17 at 08:31
@TobySpeight, thanks for the positive note. Point well taken. — Slkrasnodar, Jun 02 '17 at 11:14

score 0 · Answer 3 · 2017-06-02T06:03:16.857

0

And here is my NEW version.

My thanks to @peterm for pointing me about the prior false value of val_from_max_week_of_month. So, I corrected it:

SELECT 
    a.Year,
    a.Month,
    a.Category,
    max(a.Week) AS max_week,
    AVG(a.Value) AS avg_val,
    (
        SELECT b.Value 
        FROM decades AS b
        WHERE
            b.Year = a.Year AND
            b.Month = a.Month AND  
            b.Week = max(a.Week) AND 
            b.Category = a.Category
    ) AS val_from_max_week_of_month
FROM decades AS a
GROUP BY 
    a.Year,
    a.Month,
    a.Category
;

The new results:

edited Jun 02 '17 at 06:03

answered Jun 02 '17 at 05:07

The max value of the month is not the same as the value for the max week of the month – peterm Jun 02 '17 at 05:11
@peterm Thanks. I'll check now. – Jun 02 '17 at 05:15

pozs · Answer 4 · 2017-06-02T09:03:53.447

First, you might need to check, how do you handle the first week in January. If 1st of January are not a Monday, there are several interpretations & not every one of them will fit the solutions here. You'll either need to use:

the ISO week concept, ie. the week column should hold the ISO week & the year column should hold the ISO year (week-year, rather). Note: in this concept, 1st of January actually sometimes belongs to the previous year
use your own concept, where the first week of the year is "split" into two if 1st of January is not a Monday.

Note: the solutions below will not work if (in your table) the first week of January can be 52 or 53.

Given that avg_val is just a simple aggregation, while max_val_of_month can be calculated with typical greatest-n-per-group queries. It has a lot of possible solutions in PostgreSQL, with varying performance. Fortunately, your query will naturally have an easily determined selectivity: you'll always need (approx.) a quarter of your data.

Usual winners (in performance) are:

(These are not surprise though, as these 2 should perform more and more as you need more portion of the original data.)

`array_agg()` with `order by` variant:

select   year, month, category, avg(value) avg_val,
         (array_agg(value order by week desc))[1] max_val_of_month
from     table_name
group by year, month, category;

`distinct on` variant:

select   distinct on (year, month, category) year, month, category,
         avg(value) over (partition by year, month, category) avg_val,
         value max_val_of_month
from     table_name
order by year, month, category, week desc;

The pure window function variant is not that bad either:

`row_number()` variant:

select year, month, category, avg_val, max_val_of_month
from   (select year, month, category, value max_val_of_month,
               avg(value) over (partition by year, month, category) avg_val,
               row_number() over (partition by year, month, category order by week desc) rn
        from   table_name) w
where  rn = 1;

But the LATERAL variant is only viable with an index:

`LATERAL` variant:

create index idx_table_name_year_month_category_week_desc
  on table_name(year, month, category, week desc);

select     year, month, category,
           avg(value) avg_val,
           max_val_of_month
from       table_name t
cross join lateral (select   value max_val_of_month
                    from     table_name
                    where    (year, month, category) = (t.year, t.month, t.category)
                    order by week desc
                    limit    1) m
group by   year, month, category, max_val_of_month;

But most of the solutions above can actually utilize this index, not just this last one.

Without the index: http://rextester.com/WNEL86809
With the index: http://rextester.com/TYUA52054

cool! thanks for sharing the info the difference between table with index and table without index. it will help the performance when i use it to bigger data later! — Rakaziwi, Jun 03 '17 at 04:03

Postgresql - How to get value from last record of each month

4 Answers4

array_agg() with order by variant:

distinct on variant:

row_number() variant:

LATERAL variant:

`array_agg()` with `order by` variant:

`distinct on` variant:

`row_number()` variant:

`LATERAL` variant: