How to check existence of multiple values at once

Question

I have a table in PostgreSQL with a unique index on a bytea column.

Currently, I am running this query to determine whether or not a value already exists on that column in the table.

-- mycolumn definition: mycolumn bytea unique not null
SELECT 1 FROM mytable WHERE mycolumn = @mycolumn;

When I get the result to a DataReader, I just check to see if it has rows and return true if so.

However, I need to check potentially hundreds of values in a batch, and this feels a bit slow to run per value due to the round trip overhead.

I've thought about just sending multiple statements:

SELECT 1 FROM mytable WHERE mycolumn = @mycolumn1;
SELECT 1 FROM mytable WHERE mycolumn = @mycolumn2;
SELECT 1 FROM mytable WHERE mycolumn = @mycolumn3;
-- etc

Here, I would see if reader has rows like before, and then call NextResult() to go to the next query result. Of course, I'll have to generate this query and add the parameters. Is this the fastest way to check existence of multiple values at a time in pgsql?

I also considered a scheme using EXISTS and UNIONing the queries together, but the documentation says that UNION does not preserve ordering. So then I would have to also provide an order number as another parameter in the query so I could reorder the results on the server. This seems more wasteful than above... if it works.

Update

~~I just found that according to the docs, returning multiple result sets is not supported in PG. So above won't work. So what is the best way?~~

The command string can include multiple SQL commands (separated by semicolons). [...] Note however that the returned PGresult structure describes only the result of the last command executed from the string.

Apparently the above only refers to functions.

If you read the [postgresql documentation](https://www.postgresql.org/files/documentation/pdf/9.5/postgresql-9.5-A4.pdf#174) you will find that you can query using `SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)` — surfmuggle, Jun 12 '16 at 05:00
It seems that you deleted your comment. You can combine [creating a temp table](http://stackoverflow.com/questions/15691243/) with [inserting values into a temp table](http://dba.stackexchange.com/questions/86724/) — surfmuggle, Jun 12 '16 at 05:12
@smurfmuggle: Deleted comment to clarify. The problem with that: in order to find out which ones exist with the `IN` query, I would have to return the `bytea` column back, then do a comparison on the server to filter them out. Seems like a lot of data to pass and compare. Mind you, the data is not overly large, but it is too large to fit in primitive types including UUID, hence the bytea. — Kasey Speakman, Jun 12 '16 at 05:12
I would combine the two approaches above to return a result that looks like this `byteValue, countOfByte` maybe using `case`. This way there should be not back and forth. But since i do not know what are you doing on the "code-side" with the datareader it is difficult to give you a hint about other approaches. — surfmuggle, Jun 12 '16 at 05:41
PostgreSQL definitely supports multiple result sets, what's not supported is functions returning multiple result sets, AFAIK. So you can definitely send many selects and read the results. — Shay Rojansky, Jun 12 '16 at 21:14

Patrick · Answer 1 · 2016-06-20T03:24:13.213

If all you have to go by is the bytea values and no id or so, then you can solve it with an array of values and preserving ordinality when processing (presumably and untested, I am not an npgsql user myself):

SELECT coalesce(mt.present, false) 
FROM unnest(@values) WITH ORDINALITY x (v, ord)
LEFT JOIN LATERAL (SELECT true AS present FROM mytable WHERE mycolumn = x.v) mt ON true
ORDER BY x.ord;

You pass in an array @values of bytea values you want to test. When unnesting the array you use the WITH ORDINALITY clause so that you can preserve the order. Then join mytable to the unnested array values and check which ones have indeed a corresponding value. You will get a number of rows back equal to the length of the array, each containing a scalar value of boolean type. The ordering by ordinality ensures that rows are returned in the order of values in the array.

If you prefer to get an array of boolean back then you should wrap the query in an outer query where you array_agg(present) the coalesce(mt.present, false) AS present clause (so use an alias in the sub-query).

I consider this a valuable answer as I learned a lot. You have my upvote. — Kasey Speakman, Jun 15 '16 at 02:20
I finally got around to trying this since the "easy" way was crashing intermittently for me. You did forget the `ON` clause after the join, which is a syntax error. I just did `ON true` since the inner query should only find 0-1 rows. — Kasey Speakman, Jun 19 '16 at 23:57

score 1 · Accepted Answer · answered Jun 13 '16 at 20:57

1

You can use your original idea with Npgsql. It will do everything in one roundtrip. You will be able to get the results as well.

answered Jun 13 '16 at 20:57

Emil

16,784
2
41
52

I will give it a try later. For now, I simply issue 1000 of my original query in parallel. Npgsql/Postgres handle it like champs. It would be more efficient to do it in one. – Kasey Speakman Jun 13 '16 at 21:13
This worked. With the parallel queries, I was eventually running into max connection limits, and it was less efficient (less observed op/s). – Kasey Speakman Jun 15 '16 at 02:18

krokodilko · Answer 3 · 2016-06-12T07:20:15.743

-1

What about building a dynamic query like this:

SELECT x FROM (
    select unnest( 
        ARRAY[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,
              21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,30]
    ) as x
) x
WHERE x NOT IN ( SELECT my_column FROM my_table )

This requires only one roundtrip to the server.
You can pack several dozen numbers in one query and check them all at once.
On the application side just read missing numbers directly from the resultset.

Preasumbly, it should be also possible to create a preparedstatement with only one argument of array type, and bind an array of values to this argument instead of building a dynamic query, but I've never done it and don't know how to do it. You can do a further research on this topic.

edited Jun 12 '16 at 07:20

answered Jun 12 '16 at 06:44

krokodilko

35,300
7
55
79

You create an array to immediately unnest it? That is silly, use a `VALUES` clause instead. Furthermore, test is for presence of the data, not for the data itself. – Patrick Jun 12 '16 at 07:20
@Patrick you are absolutely right, this is uterlly silly and probably the worst solution. You are free to post your own answer with a solution using `VALUES` clause. – krokodilko Jun 12 '16 at 07:24

How to check existence of multiple values at once

Update

3 Answers3