0

I'm using PostgreSQL v10.12

Table 1: Threshold with total rows ~800 (Each sensor's minimum and maximum threshold level is store in this table)

Description:

SensorName Varchar(10), 
MinLimit numeric, 
MaxLimit numeric

Data:

SensorName | MinLimit | MaxLimit
Sensor1    | 80       | 115
Sensor2    | 60       | 70
Sensor3    | 100      | 120
...
Sensor800  | 60       | 70

Table 2: IoTData with total columns ~800+ (Each sensor in Threshold table is the column in IoTData table)

Description:

IoTDateTime timestamp without time zone, 
sensor1 numeric(3)
sensor2 numeric(8,5)
sensor3 numeric(5,2)
....
Sensor800 numeric(5,2)

For every 5 minutes record will be generated in this table. This table is partitioned with date range (4 months data will be in a partition). Likewise there are 6 partitions in for this table so far.

*e.g:

        IoTDateTime         |   Sensor1
    2020-01-01 11:05:00     |      85
    2020-01-01 11:10:00     |      80
    2020-01-01 11:15:00     |      77
    ...
    2020-01-31 23:50:00     |      70
    2020-01-31 23:55:00     |      70

From Table 1 Threshold this Sensor1's MinLimit is 80 and MaxLimit is 115. Anything below MinLimit (80) or greater than MaxLimit (115) is considered as alert or sensor is malfunctioning.

I need to find the alert count of all 800+ sensors for each day of a month.

I wrote below function to perform this logic.

Need your help to rewrite this function in better way to reduce the line of code and optimistic logic. Thanks in advance :)

CREATE OR REPLACE FUNCTION public.udf_SensorFailedForeachDay
(   pimono integer,
    pstartdate timestamp without time zone,
    penddate timestamp without time zone)

    RETURNS TABLE(date_of_month double precision, Sensor1 double precision, Sensor2 double precision, 
    ...
    , Sensor800 double precision) 
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE 
    ROWS 1000
AS $BODY$

declare Sensor1min  numeric(3);declare Sensor1max  numeric(3);

declare Sensor2min  numeric(8,5);declare Sensor2max  numeric(8,5);
....
declare Sensor800min  numeric(5,2);declare Sensor800max  numeric(5,2);
BEGIN

select minlimit,maxlimit Into Sensor1min,Sensor1max from threshold where channelname='Sensor1';
select minlimit,maxlimit Into Sensor2min,Sensor2max from threshold where channelname='Sensor2';
...

select minlimit,maxlimit Into Sensor800min,Sensor800max from threshold where channelname='Sensor800';
Return query 
select extract(day from a.IoTDateTime) as date_of_month,
(cast(sum(case when a.Sensor1 between Sensor1min and Sensor1max then 1 end) as float)/cast(count(a.IoTDateTime) as float) )*100 as Sensor1,
(cast(sum(case when a.Sensor2 between Sensor2min and Sensor2max then 1 end) as float)/cast(count(a.IoTDateTime) as float))*100 as Sensor2,
...
(cast(sum(case when a.Sensor800 between Sensor800min and Sensor800max then 1 end) as float)/cast(count(a.IoTDateTime) as float))*100 as Sensor800
from IoTData a where a.IoTDateTime between Pstartdate and Penddate
group by extract(day from a.IoTDateTime) order by extract(day from a.IoTDateTime);

END;
$BODY$;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Unrelated to your question, but: you only need one `declare` section for all variables, not one for each variable. –  May 28 '20 at 12:32
  • You need to change your label. PLSQL is Oracle's proprietary language extension for sql. For Postgres it is plpgsql (extension language) or pslq (command line interface). – Belayer May 28 '20 at 16:25
  • Thanks. changed the label – A satheesh kumar May 28 '20 at 17:16
  • When i tried one declare for all variable got ERROR: syntax error at or near "," e.g: declare Sensor1min numeric(3),Sensor1max numeric(3); – A satheesh kumar May 29 '20 at 02:39
  • 1
    Keyword `DECLARE` should be used only once per block - please, start by reading documentation - https://www.postgresql.org/docs/current/plpgsql-declarations.html – Pavel Stehule May 29 '20 at 06:02
  • `I need to find the alert count ...` So you just need a *count* of offending sensors per day? Please be precise what you actually need. Maybe show an exact (minimum) example of the desired result. – Erwin Brandstetter May 29 '20 at 13:01
  • Thanks. Earlier I 've tried to use comma after datatype but that since i got error i've used multiple delare. (In MS SQL Server I've used comma) declare Sensor1min numeric(3), Sensor1max numeric(3); Later after reading above URL I've used semicolon declare Sensor1min numeric(3);Sensor1max numeric(3); and it worked fine. – A satheesh kumar May 29 '20 at 20:32

1 Answers1

0

If you have a fixed number of sensors and hardly ever update rows, storing data in IoTData with a single row per timestamp, like you did, can make sense to get minimum storage size and good performance.

However, you are getting close to the absolute maximum number of columns allowed, which is an indicator that you may be moving in the wrong direction. See:

If you later need to add / remove sensors, you have to adapt the table structure and everything that depends on it, including the function I am going to supply. A painful process. So, typically, it would be more reasonable to use a more generic model with one row per measurement (800 rows instead of just 1). That bloats storage, but is much more versatile.

That said, it can work like this:

CREATE TEMP TABLE threshold (
   sensorname varchar(10), 
   minlimit numeric, 
   maxlimit numeric
);

INSERT INTO threshold VALUES
  ('Sensor1', 80 , 115)
, ('Sensor2', 60 , 70)
, ('Sensor3', 100, 120)
-- more
;

CREATE TABLE iotdata (
   iotdatetime timestamp PRIMARY KEY
 , sensor1 numeric(3)
 , sensor2 numeric(8,5)
 , sensor3 numeric(5,2)
);

INSERT INTO iotdata VALUES
  ('2020-01-01 11:05:00', 85, 65, 110)
, ('2020-01-01 11:10:00', 86, 11, 109)  -- low
, ('2020-01-01 11:15:00', 77, 15, 666)  -- low + hi
;

TABLE threshold;
sensorname | minlimit | maxlimit
:--------- | -------: | -------:
Sensor1    |       80 |      115
Sensor2    |       60 |       70
Sensor3    |      100 |      120
-- pivot table threshold to match pivoted data
CREATE TABLE dim_threshold AS
SELECT *
FROM  crosstab(
 $$(
   SELECT 'min' AS dimension, sensorname, minlimit
   FROM   threshold
   ORDER  BY sensorname
   )
   UNION ALL
   (
   SELECT 'max' AS dimension, sensorname, maxlimit
   FROM   threshold
   ORDER  BY sensorname
   )$$
   , $$(SELECT unnest('{Sensor1,Sensor2,Sensor3}'::text[]))$$
   ) AS (dimension text, sensor1 numeric, sensor2 numeric, sensor3 numeric);

TABLE dim_threshold;

dimension | sensor1 | sensor2 | sensor3
:-------- | ------: | ------: | ------:
min       |      80 |      60 |     100
max       |     115 |      70 |     120
TABLE iotdata;
iotdatetime         | sensor1 |  sensor2 | sensor3
:------------------ | ------: | -------: | ------:
2020-01-01 11:05:00 |      85 | 65.00000 |  110.00
2020-01-01 11:10:00 |      86 | 11.00000 |  109.00
2020-01-01 11:15:00 |      77 | 15.00000 |  666.00
-- aux function to calculate percentage
CREATE FUNCTION f_calc_pct(bigint, bigint)
  RETURNS float LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT ($1 * 100)::float / $2';
-- main function
CREATE OR REPLACE FUNCTION udf_sensor_fail_per_day(pstartdate timestamp
                                                 , penddate   timestamp)
  RETURNS TABLE(the_day date, dimenstion text, sensor1 float, sensor2 float, sensor3 float) -- more?
  LANGUAGE sql ROWS 100 AS
$func$
WITH cte AS (
   SELECT iotdatetime::date AS the_day
        , count(*) AS ct
        , count(*) FILTER (WHERE i.sensor1 < min.sensor1) AS s1_min
        , count(*) FILTER (WHERE i.sensor2 < min.sensor2) AS s2_min
        , count(*) FILTER (WHERE i.sensor3 < min.sensor3) AS s3_min
      -- more ...
        , count(*) FILTER (WHERE i.sensor1 > max.sensor1) AS s1_max
        , count(*) FILTER (WHERE i.sensor2 > max.sensor2) AS s2_max
        , count(*) FILTER (WHERE i.sensor3 > max.sensor3) AS s3_max
      -- more ...
   FROM   iotdata i
   CROSS  JOIN (SELECT * FROM dim_threshold WHERE dimension = 'min') min
   CROSS  JOIN (SELECT * FROM dim_threshold WHERE dimension = 'max') max
   WHERE  iotdatetime >= pstartdate
   AND    iotdatetime <  penddate
   GROUP  BY 1
   )
SELECT the_day, 'min' AS dimension
     , f_calc_pct(s1_min, ct) -- AS s1
     , f_calc_pct(s2_min, ct) -- AS s2
     , f_calc_pct(s3_min, ct) -- AS s3
    -- more ...
FROM   cte
UNION ALL
SELECT the_day, 'max' AS dimension
     , f_calc_pct(s1_max, ct) -- AS s1
     , f_calc_pct(s2_max, ct) -- AS s2
     , f_calc_pct(s3_max, ct) -- AS s3
    -- more ...
FROM   cte;
$func$;
-- call
SELECT * FROM udf_sensor_fail_per_day('2020-01-01 00:00', '2020-01-11 00:00');
the_day    | dimenstion | sensor1          | sensor2          | sensor3         
:--------- | :--------- | :--------------- | :--------------- | :---------------
2020-01-01 | min        | 33.3333333333333 | 66.6666666666667 | 0               
2020-01-01 | max        | 0                | 0                | 33.3333333333333

db<>fiddle here

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228