3

I am executing a query in postgresql database from a java program, and the query is like:

SELECT ... FROM table_name WHERE column_name IN("<list of values>");

the problem is that the list of values is created dynamically and there is no limit to its size, and when this list is getting too big I am getting a PSQLException with the message: stack depth length exceeded. my question is how do I determine how many arguments can I use with the IN clause before this exception happens? let's say the max_stack_depth is 2MB, can I use that to determine how many arguments can I use with the IN clause?

Nawar Khoury
  • 170
  • 2
  • 13
  • 2
    Did you try `column_name IN("") OR column_name IN("")`? Does it let you pass more items? I used this trick to work around a problem in Oracle. I am not sure if it is going to work with Postgresql. – Sergey Kalinichenko Sep 15 '14 at 12:59
  • Do the values come from same database even if from other tables ? – Serge Ballesta Sep 15 '14 at 13:05
  • you are right, this can be helpful, I want to eventually do something like that, but my problem is what is the maximum length allowed for and for before I exceed the stack depth limit, my problem is with calculating that threshold at which I am going to break my list into two, and I couldn't really find anything helpful on the internet about it... can you help me with it? – Nawar Khoury Sep 15 '14 at 13:05
  • Serge, if the values came from the same database I would have made a join, but actually the values are read from an external file... so no way to join, unless I insert them in a temporary table and I am trying to avoid that because I don't think the performance would be very good. – Nawar Khoury Sep 15 '14 at 13:07
  • 1
    Please show the full exception text, and your PostgreSQL version from `SELECT version()`. – Craig Ringer Sep 15 '14 at 13:31
  • dear craig, that's the postgre version: PostgreSQL 8.4.20 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit – Nawar Khoury Sep 15 '14 at 13:58
  • , and as for the exception: Caused by: org.postgresql.util.PSQLException: ERROR: stack depth limit exceeded at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1531) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1313) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354) – Nawar Khoury Sep 15 '14 at 14:00

3 Answers3

4

Do an inner join to a values table

select ...
from
    table_name
    inner join
    (values
        (first_value), (second_value)
    ) s(column_name) using (column_name)
;

Or, easier to string build, to an unnested array

select ...
from
    table_name
    inner join
    unnest(
        array[first_value, second_value]
    ) s(column_name) using (column_name)
;
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • 1
    A `values` list is likely to be a lot faster, though. For very big lists you can even create a temp table. I wrote some more about it a while ago - see http://stackoverflow.com/a/17038097/398670 – Craig Ringer Sep 15 '14 at 13:27
  • that's a brilliant solution... I haven't thought of this... thank you very much :) although I still don't know the limit of the list for the IN clause, but changing my query into this is actually better than dividing into many lists or inserting into temporary table :) I am gonna use this solution... thnx ;) – Nawar Khoury Sep 15 '14 at 14:10
2

If you are using 7.4 or earlier, it depends on the max_expr_depth setting (or max_stack_depth in newer I believe).

You can always check it's value:

test=> SHOW max_expr_depth;

And change

test=> SET max_expr_depth TO 10;

Then you can operate on this value.

Found HERE

The default value of 10000 is high enough for any normal query, but you can raise it if needed. (But if you raise it too high, you run the risk of server crashes due to stack overflow.)

This is not the best solution for your problem, but I believe it answers your question.

Kamil Kłys
  • 1,907
  • 3
  • 18
  • 30
  • 3
    ... but if you're using 7.4 or earlier, you're probably also half-way to fossilised, though. – Craig Ringer Sep 15 '14 at 13:31
  • that is why I mentioned `max_stack_depth` property (replacement for `max_expr_depth`) – Kamil Kłys Sep 15 '14 at 13:32
  • thnx a lot guys for your help, I actually increased the max_depth as a temporary solution but now I am trying to optimize my program so I don't need that increment... that's why I wanted know the maximum number of allowed parameters for the IN clause... but really thnx, I appreciate the help :) – Nawar Khoury Sep 15 '14 at 14:13
1

When you fear that there might be a huge number of values in the in clause, you can use a temporary table.

You create a temporary table with one column and an index on that column. Before the query, you put your data for the in clause as rows in the temporary table. Then you execute the query, using one of the following contructs:

SELECT * FROM table_name WHERE column_name in (SELECT column_name FROM temptable);
SELECT * FROM table_name a JOIN temptable b ON (a.column_name = b.column_name);
SELECT * FROM table_name a WHERE EXISTS (SELECT * FROM temptable t WHERE a.column_name=t.column_name);

There won't be a problem with parallel running queries: Contents in a temporary table written by one session are invisible to any other session.

The big advantage with this approach is, that the SQL parses does not have to produce all you values. The structure of the query stays the same, you can even use a prepared statement. This together with the usage of an index can be a huge performance boost.

stefan.schwetschke
  • 8,862
  • 1
  • 26
  • 30
  • The cost of creating the index is bigger than a sequential scan and the probability of it being used is small. – Clodoaldo Neto Sep 15 '14 at 13:41
  • @ClodoaldoNeto With lots of rows in the table and very few RAM the index might be used and it might be faster (using a merge join, as the index is already sorted). Alternatively a bitmap index might be useful. – stefan.schwetschke Sep 15 '14 at 13:46