0

We have a RAILS app with Postgres as database and have a use case for drawing a graph over a range of values. Unfortunately the range is a decimal, so I am not able to use the generate_series function of Postgres. Need help in figuring out an optimal way to query this rather than splitting this into 10 different queries. Here's sample data

  • We have a table with score | students
  • Given a query I would get a set of score-student tuples, from which I get range(min(score), max(score)). For example range(10.25, 16.80)
  • We need to break the above range into 10 steps with an interval of 0.655 which is (max-min)10 - 10.25,10.91,11.56,12.22,12.87
  • For each step above show the number of students between that score and previous value
  • Result would be an array with [(10.25,11232),(10.91,2434),....]

Any way/thoughts to do this in Postgres in a single query or less than 10+ queries?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
sumit
  • 436
  • 3
  • 15

1 Answers1

1

Your result as set (makes more sense to me):

WITH base AS (
   SELECT student, score
   FROM   tbl
   WHERE  <some_condition>
   )
, border AS (
   SELECT min(score) AS min_score, max(score) AS max_score
   FROM   base
   )
SELECT lower_bound, ct
FROM  (
   SELECT step
        , min_score + ((max_score - min_score) * (step-1)) / 10 AS lower_bound
   FROM   border, generate_series(1,10) step
   ) x
LEFT   JOIN (
   SELECT width_bucket(b.score, x.min_score, x.max_score, 10) AS step
        , count(*)::int AS ct
   FROM   border x, base b
   GROUP  BY step
   ) y USING (step)
ORDER  BY step;

Featuring two CTEs, generate_series() (still useful) and the often overlooked function width_bucket().

To produce an array of composite types, like outlined in the question, first create a matching type (once):

CREATE TYPE my_type AS (bound numeric, ct int);

Assuming numeric values for lack of information.
Then feed above query to an array constructor:

SELECT ARRAY (
   <query from above>
   SELECT (lower_bound, ct::int)::my_type   -- only difference
   <query from above>
   );
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    @erwin-brandsetter, thanks for the detailed answer and apologies for missing postgres version and sqlfiddle (wasn't aware). I am on 9.3 so have access to the functions you mentioned. width_bucket() seems the right tool for my use case. With your answer I am able to get what I want. I also ran into http://tapoueh.org/blog/2014/02/21-PostgreSQL-histogram, which helped some more. – sumit Jun 24 '14 at 03:06