0

I see that PostgreSQL array is good for performance if the array's element is the data itself, e.g., tag

http://shon.github.io/2015/12/21/postgres_array_performance.html

How about if I use array as a way to store foreign keys of integer? Barring foreign key constraint problem, is it advisable to store foreign keys with integer array?

Apps should optimize for report or analytics. So if the app will end up joining the array to table most of the time, say the app need to show the label/title/name of the foreign key, is it still OK to use array for storage of foreign keys?

Would the performance be better when array is smallish as compared to using a junction table, say checkboxes of movie genres integer?

How about if the array is in thousands, would the performance be better when not using array and just use junction table instead?

user4157124
  • 2,809
  • 13
  • 27
  • 42
Green Lantern
  • 858
  • 10
  • 21

1 Answers1

4

No, storing FKs in an array is never a good idea for general purpose tables. First an foremost, there is the fact you mentioned in passing: Foreign key constraints for array elements are not implemented (still true for Postgres 14). This alone should void the idea.

There was an attempt to implement the feature for Postgres 9.3 that was stopped by serious performance issues. See this thread on pgsql-hackers.

Also, while read performance can be improved with arrays for certain use cases, write performance plummets. Think of it: To insert, update or delete a single element from a long array, you'd have to write a new row version with the whole array. I see serious lock contention ahead, too.

If your table is read only, the idea starts to make more sense. But then I would consider a materialized view with de-normalized arrays on top of a normalized many-to-many implementation. See:

While being at it, the MV can include all join tables and produce one flat table for even better read performance (for typical use cases). This way you get referential integrity and good read (and write) performance - at the cost of the overhead and additional storage for managing the MV.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • A possible exception are the system catalogs, where the FK constraints are represented by two *parallel* arrays. But: these arrays are very small, and updates on catalogs are not that frequent and performed by *internal* logic, not directly by DML-statements. – wildplasser Jul 02 '16 at 14:28
  • 1
    I must admit, I find it idiomatic and intuitive that the model is array and then store the model as an array too. However, only speed would seal the deal. If array cannot deliver the speed, I will not use it. I still yet to find a benchmark where the foreign keys (not tags-like element) that are implemented as array and joined to reference table, resulted to faster-performing query as compared to junction table implementation. – Green Lantern Jul 03 '16 at 01:11