2

I have been wondering whether it's possible to do multidimensional tables in PostgreSQL. Here's an example table from my project:

     id |  created_by   |          content           | comments |   
     ---+---------------+----------------------------+----------+
      1 | Anonymous     | does this thing work?      |          | 
      2 | James         | this is the body           |          | 
      3 | Chan          | this must work this time~! |          | 
      4 | Freak         | just to add something new  |          | 
      5 | Anonymous     | yahoo!                     |          | 

What do I mean by multidimensional table? It would look like something like this if there's such thing.

     id |  created_by   |          content           |             comments                     |   
     ---+---------------+----------------------------+------------------------------------------+
      1 | Anonymous     | does this thing work?      | id | created_by |         comments       | 
      2 | James         | this is the body           |                 |created_by|   comments  | 
      3 | Chan          | this must work this time~! |                                  | 
      4 | Freak         | just to add something new  |                                  | 
      5 | Anonymous     | yahoo!                     |                                  | 

This is just an example. But the key concept is that in every comment, there's another set of columns, making comments sort of like a table by itself.

So yeah, does this exist in Postgres or is there any better way to implement this feature? :)

maverick97
  • 132
  • 2
  • 8
  • What problem are you trying to solve? Although this *could* be done using a custom type and an array, I highly doubt that it would be a good idea. If all you want is "dynamic attributes" you should look at the `hstore` datatype –  Apr 20 '14 at 20:12
  • How reddit's commenting system works is what I'm trying to imitate: http://www.reddit.com/r/gaming/comments/23ippa/orbital_mechanics/. A comment table holding information to a comment and another comment table. I thought of using arrays but thought it would be messy... Hmm okay, will look "dynamic attributes" up. – maverick97 Apr 20 '14 at 20:18
  • 2
    That's a standard hierarchical data model. Search for "adjacency list model" (you essentially store the PK of the "parent row" in the table) –  Apr 20 '14 at 20:20
  • Here is an example: http://sqlfiddle.com/#!15/c9d06/4 –  Apr 20 '14 at 20:29
  • Thanks for that sqlfiddle! Exactly what I want. But would you be able to explain it? There are some parts I do not understand. – maverick97 Apr 21 '14 at 08:53
  • I guess you are referring to the `with` part. It's a recursive query. See the manual for an introduction: http://www.postgresql.org/docs/current/static/queries-with.html –  Apr 21 '14 at 09:04
  • Yup, thanks for the link. – maverick97 Apr 21 '14 at 09:07
  • Read about hstore, json, XML, EAV. Lots of questions here address this topic already. See: http://stackoverflow.com/a/17655754/398670, http://dba.stackexchange.com/q/27057/7788, http://stackoverflow.com/q/13294003/398670 . Also `ltree` for tree strutures. Though on reading more, looks like horse has nailed it with the real goal being an adjacency list. – Craig Ringer Apr 21 '14 at 11:54

1 Answers1

2

I would like to convince you, if possible, to not encode your data this way, (independent of how terrible an idea it is)

Lets suppose you have a really hot post, goes viral, et-cetera. That means all of your users are viewing it and many are trying to comment on it. with all of your nested discussion embedded in a single row, all updates must apply to that row. This in turn means that every update on that discussion competes with every other to update that one attribute. As you might imagine, this write contention will make your database slow way down.

A second reason is that it violates the rules of first normal form; in the sense that the comment attribute on the table you're showing contains more than one value. The motivating reasoning for this widely applied rule is that it makes a larger number of queries possible. In your design, it would be very difficult to delete from COMMENTS where USER = 'spammy-user'*, or even select * from COMMENTS where text like '%Trending Topic%'. In general, if you might ever want to look at part of a value in a column, rather than the whole thing, then you're probably looking at an opportunity for normalization.

The rule I try to use is "each 'kind of thing' gets its own table". as comments are a 'kind of thing', we'll split them out:

create table COMMENTS(
    COMMENT_ID serial primary key,
    POST_ID integer not null foreign key references POSTS(ID),
    PARENT_COMMENT_ID integer foreign key references COMMENTS(COMMENT_ID),
    CREATED_BY ...
    CONTENT ...
)

with the convention that comments having a null parent_comment_id are the roots of threaded discussions.

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
  • Oh okay, thanks for the insightful opinion. I'm thinking of limiting the number of levels it can go. Right, null parent keys, will try that out! Will upvote you when I have enough reputation to do so haha. – maverick97 Apr 21 '14 at 08:56