1

I've got a pretty common setup for an address database: a person is tied to a company with a join table, the company can have an address and so forth.

All pretty normalized and easy to use. But for search performance, I'm creating a materialized, rather denormalized view. I only need a very limited set of information and quick queries. Most of everything that's usually done via a join table is now in an array. Depending on the query, I can either search it directly or join it via unnest.

As a complement to my zipcodes column (varchar[]), I'd like to add a states column that has the (German fedaral) states already precomputed, so that I don't have to transform a query to include all kinds of range comparisons.

My mapping date is in a table like this:

CREATE TABLE zip2state (
    state TEXT NOT NULL,
    range_start CHARACTER VARYING(5) NOT NULL,
    range_end CHARACTER VARYING(5) NOT NULL
)

Each state has several ranges, and ranges can overlap (one zip code can be for two different states). Some ranges have range_start = range_end.

Now I'm a bit at wit's end on how to get that into a materialized view all at once. Normally, I'd feel tempted to just do it iteratively (via trigger or on the application level). Or as we're just talking about 5 digits, I could create a big table mapping zip to state directly instead of doing it via a range (my current favorite, yet something ugly enough that it prompted me to ask whether there's a better way)

Any way to do that in SQL, with a table like the above (or something similar)? I'm at postgres 9.3, all features allowed...

For completeness' sake, here's the subquery for the zip codes:

    (select array_agg(distinct address.zipcode)
      from  affiliation
      join  company
        on  affiliation.ins_id = company.id
      join  address
        on  address.com_id = company.id
     where  affiliation.per_id = person.id) AS zipcodes,
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
mhd
  • 1,339
  • 1
  • 13
  • 14

1 Answers1

1

I suggest a LATERAL join instead of the correlated subquery to conveniently compute both columns at once. Could look like this:

SELECT p.*, z.*
FROM   person p
LEFT   JOIN LATERAL (
   SELECT array_agg(DISTINCT d.zipcode) AS zipcodes
        , array_agg(DISTINCT z.state)   AS states
   FROM   affiliation    a
   -- JOIN   company     c ON a.ins_id = c.id  -- suspect you don't need this
   JOIN   address        d ON d.com_id = a.ins_id  -- c.id
   LEFT   JOIN zip2state z ON d.zipcode BETWEEN z.range_start AND z.range_end
   WHERE  a.per_id = p.id
   ) z ON true;

If referential integrity is guaranteed, you don't need to join to the table company at all. I took the shortcut.

Be aware that varchar or text behaves differently than expected for numbers. For example: '333' > '0999'. If all zip codes have 5 digits you are fine.

Related:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Awesome, not only did you answer my question, but you also helped to optimize the query and made me more aware of a new SQL feature that I can exploit in the future. Geeze, a simple `LEFT JOIN` to get everything within the ranges, I was really over-thinking it this time... – mhd Sep 18 '15 at 09:57