1

I have a basic query that fetches rows from data dumped from google autocomplete containing the place_id, formatted address and a json column which contains the entire json object. I believe I am using Postgres 10 or 11.

WITH 
    t AS (
        SELECT place_id,formatted,full_json 
        FROM addresses_autocomplete 
        WHERE json_array_length(full_json) > 7
    ), 
    r AS (
        SELECT short_name FROM regions WHERE regions.country_id = 1
    )
SELECT DISTINCT ON(place_id) 
    t.full_json->5->'short_name' AS state, 
    t.full_json->7->'long_name' as postal_code, 
    full_json->3->'short_name' AS city 
FROM t 
INNER JOIN r ON r.short_name = t.full_json->5->>'short_name' 
WHERE t.formatted=$1;

Is there any way to write my query so that t.full_json->5->'short_name' does not get repeated?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
GrandFleet
  • 809
  • 2
  • 10
  • 25
  • You could use [`LATERAL`](https://stackoverflow.com/a/36530228/5070879) – Lukasz Szozda Sep 28 '19 at 20:47
  • No. Look at https://www.periscopedata.com/blog/sql-query-order-of-operations or https://www.designcise.com/web/tutorial/what-is-the-order-of-execution-of-an-sql-query for example. `from` and `where` clauses executes before `select`. In some (most?) DBMSs you can use aliases from `select` (or just column numbers) in `order by` and `group by` clauses. – Abelisto Sep 28 '19 at 20:53
  • There is [no Postgres version 9.11](https://www.postgresql.org/support/versioning/). And there is no need to believe. Use `SELECT version();`. – Erwin Brandstetter Sep 28 '19 at 23:05
  • Well I couldn't recall what version I had been using. – GrandFleet Sep 28 '19 at 23:20
  • Are you not using it anymore? The arsenal of available tool depends on the Postgres version. It's just a coincidence that there is a solution for this with very basic tools. – Erwin Brandstetter Sep 28 '19 at 23:30

1 Answers1

0

Like @Abelisto commented, you cannot refer to SELECT list expressions from the WHERE (or JOIN) clause of the same command because expressions in the WHERE clause are evaluated first. Consider the sequence of events:

You need a subquery or CTE to avoid repeating the expression. However, your expressions are not exactly the same to begin with. You extract a simple text as json and use the same as actual text in the JOIN clause. The extraction is tricky by itself. Consider:

test=# SELECT json '"foo"' #>> '{}';
 ?column?
----------
 foo

#>> is the operator to ...

Get JSON object at specified path as text

Passing an empty array as path ({}) extracts text from the simple json text. Related:

Now, to avoid repeating the extract path, extract state in the CTE or in a subquery and apply the above expression in the outer SELECT:

SELECT DISTINCT ON (a.place_id)
       a.state
     , a.full_json -> 7 -> 'long_name'  AS postal_code
     , a.full_json #> '{3,short_name}'  AS city    -- alternative syntax, btw.
FROM  (
   SELECT full_json -> 5 -> 'short_name' AS state  -- once!
        , full_json
   FROM   addresses_autocomplete
   WHERE  json_array_length(full_json) > 7
   AND    formatted = $1
   ) a
JOIN   regions r ON r.short_name = t.state #>> '{}'  -- key element!
WHERE  r.country_id = 1
-- ORDER  BY place_id, ??? -- put expression(s) here to get deterministic pick

Equivalent to your original. While being at it I simplified the query, inlined the CTEs (don't use CTEs unless needed before Postgres 12, they typically add cost) and added ORDER BY, which will be required to get a deterministic result.

About DISTINCT ON & ORDER BY:


But are you sure you want postal_code, city and state as type json? That's an odd requirement.

Also, the above query may not produce the best query plan. The key to performance is to support the most selective predicates with optimal indexes. Repeating an expression hardly matters. You might want this instead:

SELECT DISTINCT ON (a.place_id)
       a.full_json #>> '{5,short_name}' AS state
     , a.full_json #>> '{7,long_name}'  AS postal_code
     , a.full_json #>> '{3,short_name}' AS city
FROM   addresses_autocomplete a
JOIN   regions r ON r.short_name = a.full_json #>> '{5,short_name}'  -- back to repeating
WHERE  a.formatted = $1
AND    json_array_length(a.full_json) > 7
AND    r.country_id = 1
-- ORDER  BY a.place_id, ???
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228