0

I am querying a postgresql 9.4 database and I want to perform calculations using columns inside the same query.

The result I am trying to obtain is a partial value based on the number of days passed out of a total_days amount. E.g.

  • start_date: 01/01/2016,
  • duration_in_months: 2,
  • total_days: 60,
  • value_x: 120.

If I launch the query today, 05/01/2016, I want to obtain:

partial_result = value_x * passed_days / total_days
                  120   *      5      /    60

In my dataset, I have over 100k records and I need to get this partial value grouped by month (adding the partials month by month).

=========================================================================

In MySQL I am able to do calculation as follows:

SELECT 
  start_date,
  duration_in_months, 
  @end_date:= DATE_ADD(start_date, INTERVAL duration_in_months MONTH) as end_date,
  @total_days:= DATEDIFF(@end_date, start_date),
  @passed_days:= DATEDIFF(CURDATE(), start_date),
  value_x,
  (value_x * @passed_days / @total_days) as partial_result

  FROM table;

Following the instructions found in this question previously asked, I am currently using in PostgreSQL a query like:

SELECT
  start_date,
  duration_in_months,
  end_date,
  total_days,
  value_x,
  (value_x * passed_days / total_days) as partial_result

  FROM (SELECT *,
         (start_date + (duration_in_months || ' month')::INTERVAL) as end_date,
         EXTRACT(DAY FROM (start_date + (duration_in_months || ' month')::INTERVAL) - start_date) as total_days, 
         EXTRACT(DAY FROM current_date - start_date) as passed_days
        FROM table) as table1;

I would need your help in order to:

  • use calculated variables in PostgreSQL like in MySQL using formulas in the query or find another way to make the query more readable
  • group the partials results by months
  • insert a where clause to ensure that

    passed_days >= 0 and passed_days <= total_days

Thank you very much in advance and feel free to ask for more details.

Community
  • 1
  • 1
mabe02
  • 2,676
  • 2
  • 20
  • 35
  • Could you please provide some more details, like the DDL of the table you query? You could use http://sqlfiddle.com/ for that too ;) – Thomas Berger Jan 05 '16 at 15:19
  • CREATE TABLE subscribed_guarantees ( id integer, insurance_company character varying, gross_pdu numeric(8,2), tax numeric(8,2), start_date timestamp without time zone, duration_in_months integer, "number" integer, license_plate character varying, customer_surname character varying, customer_name character varying, dealer_name character varying, product_description character varying, car_info_model character varying, car_info_brand character varying, insurance_policy_document_name character varying, acceptance_document_name character varying ) – mabe02 Jan 05 '16 at 15:55
  • Grouping the partial results by month makes a little sense to me, please describe what you want as results: grouping makes sense only if all other data is aggregated (f.ex. do you want to select an average of partial results by each month? or max? something else?) – pozs Jan 05 '16 at 15:58
  • I need to produce a report about something acting like an accrued interest by month. Which means that for a contract subscribed in jul-15, I need to sum the interest accrued in july, while in august I need to sum the partial of july and the partial of august. At the same time, I have to make sure that I am not calculating the interest over the expiracy date (passed_days <= total_days) – mabe02 Jan 05 '16 at 16:15

3 Answers3

1

First, your MySQL query is not guaranteed to work. MySQL documentation is quite explicit that the order of evaluation of expressions in the SELECT can be arbitrary. Hence, the last expression could be evaluated before the variables are set (well, actually they would be set to values in the previous row).

In Postgres, I think you have the right idea with a subquery or CTE. You just reference the columns without a @. I don't know if the specific date arithmetic is correct, but this is the equivalent query:

SELECT start_date, duration_in_months, end_date, total_days, value_x,
       (value_x * passed_days / total_days) as partial_result
FROM (SELECT t.*,
             (start_date + (duration_in_months || ' month')::INTERVAL) as end_date,
             EXTRACT(DAY FROM (start_date + (duration_in_months || ' month')::INTERVAL) - start_date) as total_days, 
             EXTRACT(DAY FROM current_date - start_date) as passed_days
      FROM table t
     ) t;

The extract(day) looks wrong to me, but you are extracting the day from an interval not a date/time expression. I think it does what you want.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for your answer. So you think that there isn't any better way to do it in Postgres? How would you group the partial results by month? And how to ensure that passed_days are either max (passed_days, 0) and min(passed_days, total_days)? – mabe02 Jan 05 '16 at 15:03
  • @mabe02 . . . I didn't see any such checks in the MySQL code. But in either database you could use `least()` and `greatest()`. – Gordon Linoff Jan 06 '16 at 02:52
1

Because your expressions use each other, you should use multiple subqueries (if you don't want to repeat any expression).

Or, you could use LATERAL subqueries, f.ex:

SELECT  start_date,
        duration_in_months, 
        end_date,
        total_days,
        passed_days,
        value_x,
        (value_x * passed_days / total_days) as partial_result
FROM    table,
LATERAL (SELECT (start_date + (duration_in_months * INTERVAL '1 month'))::date end_date) end_date,
LATERAL (SELECT end_date - start_date::date total_days) total_days,
LATERAL (SELECT current_date - start_date::date passed_days) passed_days

DATEDIFF can be calculated with date1 - date2 in PostgreSQL, no need to use EXTRACT (but arguments must have a type of date; timestamp(tz) difference produces intervals).

You can use GREATEST and LEAST for constraining passed_days (if you want to select all rows), but you can use passed_days in WHERE too, if you want.

pozs
  • 34,608
  • 5
  • 57
  • 63
0

I found a proper solution to do it in PostgreSQL:

  • grouping by month: use with table as ( ) statement at the beginning of he query. Then do an inner join
  • declare variables: use subqueries

=========================================================================

WITH time_ranges AS (
SELECT       to_date('2014-07-01', 'yyyy-mm-dd') as START_DATE, to_date('2014-07-31', 'yyyy-mm-dd') as END_DATE
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2014-08-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2014-09-30', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2014-10-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2014-11-30', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2014-12-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-01-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-02-28', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-03-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-04-30', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-05-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-06-30', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-07-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-08-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-09-30', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-10-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-11-30', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2015-12-31', 'yyyy-mm-dd')
  UNION SELECT to_date('2014-07-01', 'yyyy-mm-dd'), to_date('2016-01-05', 'yyyy-mm-dd')
)

SELECT time_ranges.end_date, round(SUM(gross_pdu * LEAST(total_days, GREATEST( EXTRACT(DAY FROM(time_ranges.end_date - guarantees_days.start_date)), 0) ) / total_days)::numeric, 2)
FROM
(SELECT
  *,
  EXTRACT(DAY FROM (start_date + (duration_in_months || ' month')::INTERVAL) - start_date) as total_days
FROM subscribed_guarantees
) as guarantees_days
INNER JOIN
time_ranges ON
time_ranges.start_date <= guarantees_days.start_date AND guarantees_days.start_date <= time_ranges.end_date
WHERE INSURANCE_COMPANY = 'INSURANCE COMPANY' AND TAX = 13.5
 GROUP BY
  time_ranges.end_date
 ORDER BY
  time_ranges.end_date
mabe02
  • 2,676
  • 2
  • 20
  • 35