3

I've got a table in my Postgres DB that looks something like this:

date          duration
2018-05-10      10
2018-05-12      15
2018-06-01      10
2018-06-02      20
2019-01-01      5
2019-01-02      15
2019-04-01      10

And I wish to sum the values for each month and group them by year, month and, month number into something like this:

year    month    month_number   monthly_sum
2018    May         5              25
2018    June        6              30
2019    Jan         1              20
2019    Apr         4              10

And ended up with a query that looks like this:

SELECT 
  to_char(date_trunc('month', date), 'YYYY') AS year,
  to_char(date_trunc('month', date), 'Mon') AS month,
  to_char(date_trunc('month', date), 'MM') AS month_number,
  sum(duration) AS monthly_sum
FROM timesheet 
GROUP BY year, month, month_number

And it works just fine, my question is: is this query considered bad? Will it affect performance if I have like.. 100k rows? I heard using to_char is inferior to date_trunc, which is what I tried to avoid here, I just wrapped the date_trunc in a to_char. Also, having three values in a GROUP BY clause, does it affect anything?

erikvm
  • 858
  • 10
  • 30
  • you could use [extract](https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT) instead of to_char – Jorge Campos Feb 21 '19 at 18:46

3 Answers3

5

The query is not bad, but you can simplify it.

SELECT to_char(date_trunc('month', date), 'YYYY') AS year,
       to_char(date_trunc('month', date), 'Mon') AS month,
       to_char(date_trunc('month', date), 'MM') AS month_number,
       sum(duration) AS monthly_sum
FROM timesheet 
GROUP BY date_trunc('month', date);

From a performance perspective, shorter GROUP BY keys would have a small impact on performance, but that is not something I would worry about.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

using functions and grouping using them accordingly may degrade performance. It is preferable to have Calendar table with proper indexes for this purpose, so that you won't need to deal with such issues on every table.

Check This and this (Calendar Table)

Derviş Kayımbaşıoğlu
  • 28,492
  • 4
  • 50
  • 72
0

Since your query does not have any filtering condition, it's always reading all the rows of the table: this is the major impact in performance. Had you had filtering conditions, you could be better of having the right indexes.

Having said that, the way you are extracting years and months could be marginally improved as other answers here show, but that will have little impact in the performance of the query.

In sum, in the absence of filtering conditions your query is close to optimal.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • If I were to add a filter to query for a specific year only, where would the `WHERE` clause go? I can't just go `...FROM timesheet WHERE YEAR = 2019` because the column year doesn't exist – erikvm Feb 21 '19 at 19:05
  • 1
    `WHERE extract(year from date) = 2019` – Jorge Campos Feb 21 '19 at 19:07