25

I'm modeling many-to-many relationship where the relationship is accessed most of the time from one side only. It's more like a hierarchy, that is accessed top-down and not the other way around.

Survey has and belongs to many Questions has and belongs to many Answers.

Both relationships must be many-to-many because a same question can be re-used across different surveys and same answer in many questions. This is a requirement.

The standard M2M implementation would use two junction tables, surveys_questions and questions_answers. Instead, I'm thinking about using PostgreSQL's integer arrays to store question_ids in Survey and answer_ids in Question.

We can utilize the ANY operator to query all rows matching the foreign key array.

How would we query for all the surveys with their questions and questions's answers using SQL?

How can we match the order of the rows returned with the foreign key array? ie. using question_ids = [1,2,3] is guaranteed to return question rows with the order 1, 2, 3.

How does this perform performance wise compared to junction tables (assuming proper indexes, whatever they might be)?

Would you suggest this? Are there some resources about modeling M2M like this?

Update

There was a proposal to add referential integrity for array foreign keys to PostgreSQL 9.3, but it didn't get included: http://blog.2ndquadrant.com/postgresql-9-3-development-array-element-foreign-keys/

SO question about maintaining order using foreign key array PostgreSQL JOIN with array type with array elements order, how to implement?

Old Pro
  • 24,624
  • 7
  • 58
  • 106
randomguy
  • 12,042
  • 16
  • 71
  • 101
  • you say many to many , but this sounds like one to many; many to many would mean that each survey relates to several questions and each question relates to several surveys, but that sounds a little odd, certainly, the way you phrased it 'has-many' is normally synonymous with one to many (many-to-many is usually called 'has-and-belongs-to-many') – SingleNegationElimination Dec 12 '12 at 13:13
  • @TokenMacGuy: Sorry for the confusion. Questions are re-usable across surveys and answers across questions making the relationships many-to-many. I'll replace the has many -relationship with HABTM. – randomguy Dec 12 '12 at 13:58

1 Answers1

8

Use the junction table approach. The array method is non-standard enough that you have to ask questions about how much it would work, whereas the other is completely standard.

David Aldridge
  • 51,479
  • 8
  • 68
  • 96
  • 5
    junction tables can be significantly slower than arrays though. see https://gist.github.com/joevandyk/031cf5812bd656887623 – Joe Van Dyk Jul 02 '13 at 22:40
  • Yes, sometimes there is a performance gain to be had from the array method. There's a big question over whether it would hold up for all situations of course, and one of the primary problems is that adding/removing a new link requires a modification on a potentially long row (coupon, in the example) rather than insertion/deletion of a single row, and a lock on the coupon table. – David Aldridge Jul 03 '13 at 06:59
  • Agreed! If you didn't want to modify the coupons table on every insert, you could have a coupons_products_array table (coupon_id, product_ids[]) instead. But that might be getting silly. – Joe Van Dyk Jul 03 '13 at 22:43
  • @JoeVanDyk Joining is slow, better to query a query. In my test, there is no performance difference between array and querying a query, both 0.075 ms. Array's planning time is still faster though, 0.211 ms vs 0.769 ms. https://gist.github.com/joevandyk/031cf5812bd656887623#gistcomment-1818413 – Michael Buen Jul 05 '16 at 03:49
  • 1
    Maintaining an array sounds like a locking nightmare for high-volume tables. – Alkanshel Sep 29 '17 at 20:29
  • @Amalgovinus yes it is certainly a point of guaranteed serialisation. – David Aldridge Sep 30 '17 at 13:02