1

In a PostgreSQL table I have several information stored as text. It depends on the context described by a type column what type of information is stored. The application is prepared to get by only one command the Id's of the row.

I got into trouble when i tried to compare the information (bigint stored as a string) with an external value (e.g. '9' > '11'). When I tried to cast the column, the datatbase return an error (not all values in the column are castable, e.g. datetime or normal text). Also when I try to cast only the result of a query command, I get a cast error.

I get the table with the castable rows by this command:

SELECT information.id as id, item.information::bigint as item
FROM information
INNER JOIN item
ON information.id = item.informationid
WHERE information.type = 'task'

The resulting rows are showing up only text that is castable. When I throw it into another command it results in an error.

SELECT x.id FROM (
    SELECT information.id as id, item.information::bigint as item
    FROM information
    INNER JOIN item
    ON information.id = item.informationid
    WHERE information.type = 'task'
) AS x
WHERE x.item > '0'::bigint

Accroding to the error, the database tried to cast all rows in the table.

minmer
  • 33
  • 5

2 Answers2

2

Technically, this happens because the optimizer thinks WHERE x.item > '0'::bigint is a much more efficient filter than information.type = 'task'. So in the table scan, the WHERE x.item > '0'::bigint condition is chosen to be the predicate. This thinking is not wrong but will make you fall into this seemingly illogical trouble.

The suggestion by Gordon to use CASE WHEN inf.type = 'task' THEN i.information::bigint END can avoid this, but however it may sometimes ruin your idea to put that as a sub-query and require the same condition to be written twice.

A funny trick I tried is to use OUTER APPLY:

SELECT x.* FROM (SELECT 1 AS dummy) dummy
OUTER APPLY (
    SELECT information.id as id, item.information::bigint AS item
    FROM information
    INNER JOIN item
    ON information.id = item.informationid
    WHERE information.type = 'task'
) x
WHERE x.item > '0'::bigint

Sorry that I only verified the SQL Server version of this. I understand PostgreSQL has no OUTER APPLY, but the equivalent should be:

SELECT x.* FROM (SELECT 1 AS dummy) dummy
LEFT JOIN LATERAL (
    SELECT information.id as id, item.information::bigint AS item
    FROM information
    INNER JOIN item
    ON information.id = item.informationid
    WHERE information.type = 'task'
) x ON true
WHERE x.item > '0'::bigint

(reference is this question)

Finally, a more tidy but less flexible method is add the optimizer hint to turn off it to force the optimizer to run the query as how it is written.

COY
  • 684
  • 3
  • 10
  • Another funny trick: in the PostgreSQL `(select 1 AS dummy) dummy` could be simplified to `(select) dummy` ;) – Abelisto Mar 18 '20 at 15:59
  • Oh, that's great to know! I never used `(select)`...But I am just keeping it in case some Oracle people need to take it away :P – COY Mar 18 '20 at 16:05
  • For me the answer of @Gordon Linoff and COY are up to now sufficient. Thanks! – minmer Mar 18 '20 at 16:18
1

This is unfortunate. Try using a case expression:

SELECT inf.id as id,
       (CASE WHEN inf.type = 'task' THEN i.information::bigint END) as item
FROM information inf JOIN
     item i
     ON inf.id = i.informationid
WHERE inf.type = 'task';

There is no guarantee that the WHERE filter is applied before the SELECT. However, CASE does guarantee the order of evaluation, so it is safe.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    The above is equivalent to "select inf.d as id, i,information::bigint as item". This results from thisthe WHERE condition havng eliminated all inf.type values that are not 'task'. Since WHERE is always processed before the SELECT list the WHEN condition is always true by time it is evaluated. – Belayer Mar 18 '20 at 19:08
  • @Belayer . . . There is not a guarantee that `where` is processed *before* `select` -- or, if there is, please point me to the documentation. – Gordon Linoff Mar 18 '20 at 21:53
  • @GordenLinoff OK, I admit I cannot find any official documentation either Postgres nor SQL Standards (don't buy it). But a couple good 'references' [here](https://academy.vertabelo.com/blog/sql-order-of-operations/) and [here](https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-true-order-of-sql-operations/). But I stand on my contention the result with CASE is the same as direct conversion. If the WHERE executes first the the CASE never sees anything else, if it runs after the select list then it discards any where CASE returned null. Unless the row to be discard causes an exception. – Belayer Mar 18 '20 at 23:12
  • @Belayer . . . Not at all. The `CASE` only evaluates the `THEN` for the first true `WHEN` that is encountered. That is how `CASE` expressions are defined in the SQL standard. – Gordon Linoff Mar 19 '20 at 00:19