2

I'm looking for an elegant way (in terms of syntax, not necessarily efficient) to get the frequency distribution of a decimal range.

For example, I have a table with ratings column which can be a negative or positive. I want to get the frequency of rows with a rating of certain range. - ... - [-140.00 to -130.00): 5 - [-130.00 to -120.00): 2 - [-120.00 to -110.00): 1 - ... - [120.00 to 130.00): 17 - and so on.

[i to j) means i inclusive to j exclusive.

Thanks in advance.

Mikko
  • 602
  • 4
  • 18

4 Answers4

12

You could get pretty close using 'select floor(rating / 10), count(*) from (table) group by 1'

ethrbunny
  • 10,379
  • 9
  • 69
  • 131
  • 1
    Thumbs up for the good answer and amazing handle! But, I think `round()` is more appropriate here. It's also important to keep the *scale* and *distribution* of your data in mind. Adjusting your rounding precision accordingly. – pim Oct 16 '18 at 14:16
1

I was thinking of seomthing that could do many levels like

DELIMITER $$  
CREATE PROCEDURE populate_stats()

   BEGIN
      DECLARE range_loop INT Default 500 ;
      simple_loop: LOOP
         SET the_next = range_loop - 10;
         Select sum(case when range between range_loop and the_next then 1 else 0 end) from table,
         IF the_next=-500 THEN
            LEAVE simple_loop;
         END IF;
   END LOOP simple_loop;
END $$



usage: call populate_stats();

Would handle 100 ranges from 500-490, 490-480, ... -480 - -490, -490 - -500

Michael Durrant
  • 93,410
  • 97
  • 333
  • 497
0

assuming a finite number of ranges.

Select 
sum(case when val between -140 to -130 then 1 else 0 end) as sum-140_to_-130,
sum(Case when val between -130 to -120 then 1 else 0 end) as sum-130_to_-140,
...

FROM table

and if not, you could use dynamic SQL to generate the select allowing a number of ranges however you may run into a column limitation.

xQbert
  • 34,733
  • 2
  • 41
  • 62
0

Just put your desired ranges into a table, and use that to discriminate the values.

-- SET search_path='tmp';

DROP TABLE measurements;

CREATE TABLE measurements
        ( zval INTEGER NOT NULL PRIMARY KEY
        );
INSERT INTO measurements (zval)
        SELECT generate_series(1,1000);
DELETE FROM measurements WHERE random() < 0.20 ;

DROP TABLE ranges;
CREATE TABLE ranges
        ( zmin INTEGER NOT NULL PRIMARY KEY
        , zmax INTEGER NOT NULL
        );
INSERT INTO ranges(zmin,zmax) VALUES
(0, 100), (100, 200), (200, 300),  (300, 400), (400, 500),
(500, 600), (600, 700), (700, 800),  (800, 900), (900, 1000)
        ;

SELECT ra.zmin,ra.zmax
        , COUNT(*) AS zcount
FROM ranges ra
JOIN measurements me
  ON me.zval >= ra.zmin AND me.zval < ra.zmax
GROUP BY ra.zmin,ra.zmax
ORDER BY ra.zmin
        ;

Results:

 zmin | zmax | zcount 
------+------+--------
    0 |  100 |     89
  100 |  200 |     76
  200 |  300 |     76
  300 |  400 |     74
  400 |  500 |     86
  500 |  600 |     78
  600 |  700 |     75
  700 |  800 |     75
  800 |  900 |     80
  900 | 1000 |     82
(10 rows)
wildplasser
  • 43,142
  • 8
  • 66
  • 109