1

I'm writing a PL/pgSQL function in PostgreSQL 9.3.10 to return who has attended certain classes/sessions from the following table:

Attendance
+-------+---------+---------+
| Class | Section |  Name   |
+-------+---------+---------+
|    1  |      1  | Amy     |
|    1  |      1  | Bill    |
|    1  |      2  | Charlie |
|    1  |      2  | Dan     |
|    2  |      1  | Emily   |
|    2  |      1  | Fred    |
|    2  |      2  | George  |
+-------+---------+---------+

What I want to do is, given a array of class/section id pairs (int[][]), return all people who are in those classes/sections. For example my_func(ARRAY[[1,1],[2,2]]) returns:

+-------+---------+---------+
| Class | Section |  Name   |
+-------+---------+---------+
|    1  |      1  | Amy     |
|    1  |      1  | Bill    |
|    2  |      2  | George  |
+-------+---------+---------+

If I knew the pairs beforehand, it would be a simple:

SELECT * FROM attendance 
WHERE ((class = 1 AND section = 1) OR (class = 2 AND section = 2));

Instead, the pairs will be a parameter of the function.

Right now, the only way I can think of doing this is to have the function essentially build an SQL query string by appending a bunch of WHERE clauses to the end of the query and then calling EXECUTE. Is there a better way to get my result?

EDIT: I implemented the suggestion @Erwin's suggestion and am currently able to get the results I want. Unfortunately, it appears as though it is fairly slow. Here is the function I am running:

CREATE OR REPLACE FUNCTION public.get_attendance(int[])
  RETURNS  TABLE(
    class_c int,
    section_c int
  )
AS
$BODY$
  BEGIN
    RETURN QUERY
      SELECT class, section
      FROM generate_subscripts($1, 1) as i 
      INNER JOIN attendance ON attendance.class = $1[i][1]
                            AND  attendance.section = $1[i][2];

  END;
$BODY$
LANGUAGE plpgsql VOLATILE;

Querying it like so:

SELECT *  FROM get_attendance(ARRAY[[1,15],[2,15],[3,8]]);

And I am getting the following EXPLAIN ANALYZE output

Merge Join  (cost=60.26..50139.72 rows=30840 width=8) (actual time=44.174..142.100 rows=25290 loops=1)
  Merge Cond: ((attendance.class = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1])) AND (attendance.section = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])))
  ->  Index Only Scan using class_section_idx on attendance  (cost=0.43..43372.25 rows=1233588 width=8) (actual time=0.009..86.625 rows=1145046 loops=1)
        Heap Fetches: 0
  ->  Sort  (cost=59.83..62.33 rows=1000 width=4) (actual time=0.010..0.757 rows=10031 loops=1)
        Sort Key: (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1]), (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])
        Sort Method: quicksort  Memory: 25kB
        ->  Function Scan on generate_subscripts i  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.006..0.007 rows=3 loops=1)

The problem is that the query is scanning through all attendances in the attendance table without filtering them until the join. Is there any way to fix this?

Joe
  • 25
  • 1
  • 5

2 Answers2

4

Postgres 9.4 or later

There is a more elegant way now. Pass separate 1D-arrays for values of class and section, and unnest in parallel. Same array positions match up:

CREATE OR REPLACE FUNCTION f_attendance(_class_arr int[], _section_arr int[])
  RETURNS SETOF attendance
  LANGUAGE sql ROWS 10 STABLE AS
$func$
SELECT a.*
FROM   unnest(_class_arr, _section_arr) a(class, section)
JOIN   attendance a USING (class, section);
$func$;

Call:

SELECT * FROM f_attendance('{1,2}', '{1,2}');

See:

Postgres 9.3 (original answer)

You can achieve that with a simple SQL function. Key feature is the function generate_subscripts():

CREATE OR REPLACE FUNCTION f_attendance(_arr2d int[])
  RETURNS SETOF attendance
  LANGUAGE sql ROWS 10 STABLE AS
$func$
SELECT a.*
FROM   generate_subscripts($1, 1) i
JOIN   attendance a ON a.class   = $1[i][1]
                   AND a.section = $1[i][2];
$func$;

Call:

SELECT * FROM f_attendance(ARRAY[[1,1],[2,2]]);

Or the same with an array literal - which is more convenient in some contexts, especially with prepared statements:

SELECT * FROM f_attendance('{{1,1},{2,2}}');

The function always expects a 2D array. Even if you pass a single pair, nest it:

SELECT * FROM f_attendance('{{1,1}}');

Audit of your implementation

  1. You made the function VOLATILE, but it can be STABLE. The manual:

    Because of this snapshotting behavior, a function containing only SELECT commands can safely be marked STABLE.

    Related:

  2. You also use LANGUAGE plpgsql instead of sql, which makes sense if you execute the function multiple times in the same session. But then you must also make it STABLE or you lose that potential performance benefit. The manual once more:

    STABLE and IMMUTABLE functions use a snapshot established as of the start of the calling query, whereas VOLATILE functions obtain a fresh snapshot at the start of each query they execute.

  3. Your EXPLAIN output shows an Index Only Scan, not a sequential scan like you suspect in your comment.

  4. There is also a sort step in your EXPLAIN output that does not match the code you show. Are you sure you copied the right EXPLAIN output? How did you obtain it anyway? PL/pgSQL functions are black boxes to EXPLAIN. Did you use auto_explain? Details:

  5. The Postgres query planner has no idea how many array elements the passed parameter will have, so it is hard to plan the query and it may default to a sequential scan (depending on more factors). You can help by declaring the expected number of rows. If you typically don't have more than 10 items add ROWS 10 like I did now above. And test again.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you, your suggestion worked (both here and for the response below using the WHERE clauses). Unfortunately, it appears that the query is no longer filtering the attendances by the class/section prior to joining, resulting in a very long query time. I updated the question with the function, calls, and EXPLAIN/ANALYZE output that I am getting. – Joe Jan 05 '16 at 16:55
  • @JosephAmato: Consider my added audit. – Erwin Brandstetter Jan 05 '16 at 19:59
1

If you pass an array of records to the function it is simple:

with attendance (class, section, name) as(values
    (1, 1, 'Amy'),
    (1, 1, 'Bill'),
    (1, 2, 'Charlie'),
    (1, 2, 'Dan'),
    (2, 1, 'Emily'),
    (2, 1, 'Fred'),
    (2, 2, 'George')
)
select *
from attendance
where (class, section) = any(array[(1,1),(2,2)])
;
 class | section |  name  
-------+---------+--------
     1 |       1 | Amy
     1 |       1 | Bill
     2 |       2 | George
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • It appears as though Postgres does not allow you to pass in an array of records into a function. Is there any way I can either: a) convert the multidimensional array to an array of records within the function or b) perform the "=any(...)" on a multidimensional array? – Joe Jan 05 '16 at 00:54
  • @JosephAmato: You cannot pass an array of anonymous records to a function because that requires a well-known type. Create and work with a registered type instead: `CREATE TYPE int_pair As (a int, b int);` Now, `int_pair[]` can be the data type of your input parameter. – Erwin Brandstetter Jan 05 '16 at 03:25
  • This suggestion works, but is extremely slow for some reason. It appears as though the query does not want to do an index scan on (class, section), opting instead for a sequential scan through all the data. The data subsets are small enough that an index scan should definitely be used. – Joe Jan 05 '16 at 17:00
  • @Joseph Is there an index on `(class, section)`? – Clodoaldo Neto Jan 05 '16 at 19:34
  • I don't think Postgres 9.4 can use an index when you combine row-valued types with the `ANY` construct. At least in my tests on Postgres 9.4. Not sure why, though. Might be a bug. I don't have Postgres 9.5 here to test if that's been improved. – Erwin Brandstetter Jan 06 '16 at 06:26