26

I used to have a query like in Rails:

MyModel.where(id: ids)

Which generates sql query like:

SELECT "my_models".* FROM "my_models"
WHERE  "my_models"."id" IN (1, 28, 7, 8, 12)

Now I want to change this to use ANY instead of IN. I created this:

MyModel.where("id = ANY(VALUES(#{ids.join '),('}))"

Now when I use empty array ids = [] I get the folowing error:

MyModel Load (53.0ms)  SELECT "my_models".* FROM "my_models"  WHERE (id = ANY(VALUES()))
ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
Position: 75: SELECT "social_messages".* FROM "social_messages"  WHERE (id = ANY(VALUES()))
    from arjdbc/jdbc/RubyJdbcConnection.java:838:in `execute_query'
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Eki Eqbal
  • 5,779
  • 9
  • 47
  • 81
  • 1
    If you're going to write custom queries, please be **very** careful to use placeholders: `VALUES(?)` expanded as necessary with an array to bind is way better than what you have here. You need to be careful to properly escape any raw values being injected into your SQL. What's the purpose of this query? – tadman Jul 02 '15 at 18:24
  • in my real query i have so many IDs at the IN part so i want to optimise it using ANY in this case. – Eki Eqbal Jul 02 '15 at 18:31
  • 2
    In PostgreSQL, `IN` is an alias for `= ANY` – PinnyM Jul 02 '15 at 18:34
  • @PinnyM are you sure about this? I've been reading that we can dramatically improve the performance using ANY instead of IN (https://www.datadoghq.com/blog/2013/08/100x-faster-postgres-performance-by-changing-1-line/). – Eki Eqbal Jul 02 '15 at 18:42
  • 1
    Where are all these ids coming from? Maybe you should be JOINing or using a subquery instead of sending a big list of ids to the database. – mu is too short Jul 02 '15 at 18:53
  • 2
    @PinnyM: `IN` is ***not*** an alias for `= ANY`. I provided details. – Erwin Brandstetter Jul 02 '15 at 19:21
  • @EkiEqbal the article you've linked explains that using literal value lists is faster that using the Array type, not that ANY is faster than IN (ANY can handle both Arrays and value lists). – PinnyM Jul 03 '15 at 14:16
  • @ErwinBrandstetter: you are correct, the docs use the term 'equivalent', and not 'alias' - my bad. However, since they are supposedly equivalent, do they actually have different query plans? If they do, I'm curious why the faster approach isn't always used. And if not, then why bother in this case? – PinnyM Jul 03 '15 at 14:25
  • @PinnyM: Actually, `IN` is rewritten with `= ANY` internally. I added details to my answer. – Erwin Brandstetter Jul 03 '15 at 16:06

1 Answers1

61

There are two variants of IN expressions:

Similarly, two variants with the ANY construct:

A subquery works for either technique, but for the second form of each, IN expects a list of values (as defined in standard SQL) while = ANY expects an array.

Which to use?

ANY is a later, more versatile addition, it can be combined with any binary operator returning a boolean value. IN burns down to a special case of ANY. In fact, its second form is rewritten internally:

IN is rewritten with = ANY
NOT IN is rewritten with <> ALL

Check the EXPLAIN output for any query to see for yourself. This proves two things:

  • IN can never be faster than = ANY.
  • = ANY is not going to be substantially faster.

The choice should be decided by what's easier to provide: a list of values or an array (possibly as array literal - a single value).

If the IDs you are going to pass come from within the DB anyway, it is much more efficient to select them directly (subquery) or integrate the source table into the query with a JOIN (like @mu commented).

To pass a long list of values from your client and get the best performance, use an array, unnest() and join, or provide it as table expression using VALUES (like @PinnyM commented). But note that a JOIN preserves possible duplicates in the provided array / set while IN or = ANY do not. More:

In the presence of NULL values, NOT IN is often the wrong choice and NOT EXISTS would be right (and faster, too):

Syntax for = ANY

For the array expression Postgres accepts:

  • an array constructor (array is constructed from a list of values on the Postgres side) of the form: ARRAY[1,2,3]
  • or an array literal of the form '{1,2,3}'.

To avoid invalid type casts, you can cast explicitly:

ARRAY[1,2,3]::numeric[]
'{1,2,3}'::bigint[]

Related:

Or you could create a Postgres function taking a VARIADIC parameter, which takes individual arguments and forms an array from them:

How to pass the array from Ruby?

Assuming id to be integer:

MyModel.where('id = ANY(ARRAY[?]::int[])', ids.map { |i| i})

But I am just dabbling in Ruby. @mu provides detailed instructions in this related answer:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    thanks man, this is WAY better :) aw how would I implement this in rails and activemodel? Thanks – Eki Eqbal Jul 02 '15 at 20:14
  • Your solution working perfect except when I use empty array I get: ActiveRecord::StatementInvalid (ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: operator does not exist: integer = text Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts. – Eki Eqbal Jul 03 '15 at 13:36
  • 1
    @EkiEqbal as Erwin stated above, add `::numeric[]` after the Array to avoid this problem. – PinnyM Jul 03 '15 at 14:14
  • @EkiEqbal: ... or `::int[]` in this case. – Erwin Brandstetter Jul 03 '15 at 15:28
  • 2
    @ErwinBrandstetter considering that OP was trying to improve performance by using value lists instead of Arrays (on the assumption that [the linked article in above comments](https://www.datadoghq.com/blog/2013/08/100x-faster-postgres-performance-by-changing-1-line/) still holds correct), the answer should use the format `VALUES (...), (...), ...` instead of `ARRAY[...]`. Or did I miss something? And if the article is no longer correct (or never was), then there doesn't really seem to be a point to this exercise... – PinnyM Jul 03 '15 at 17:52
  • @PinnyM: Seems like I missed your additional comment last year. You certainly got a point. I added a bit to address that above. – Erwin Brandstetter Feb 17 '16 at 15:53
  • @ErwinBrandstetter I tried your solution for ALL operator. the query is right. but it is not working as expected. Can you help me ? – Vishal Jun 28 '17 at 13:26
  • @Vishal: Maybe. Ask your question as *question*. You can always link to this one for context and / or comment here with a link to the new question. – Erwin Brandstetter Jun 29 '17 at 01:59
  • @ErwinBrandstetter I posted question from my friend's account. here is link https://stackoverflow.com/questions/44816759/all-operator-in-where-clause-in-rails It will be most helpful if you give solution. – Vishal Jun 29 '17 at 05:09
  • For those that come by this answer like I did and find the discussion very useful - the datadog article linked by @PinnyM has changed to the following: https://www.datadoghq.com/blog/100x-faster-postgres-performance-by-changing-1-line/ – Biggert Nov 09 '17 at 16:48