4

I'm modelling articles and their authors, where each article can have multiple authors and each author can have multiple articles. It is important to keep the order of authors for a given article (being the 1st on the list of authors of a given article is very different from being the last author). How do I best keep the order of authors? (I am using postgres, 9.4)

The classical many-to-many via an intermediate table of "author_id, article_id" doesn't keep the order of authors for a given article. I see two solutions:

  1. add an author_rank column to the author_articles table, making it "author_id, article_id, author_rank"

  2. scrap the intermediate table, add an "author_ids" column in the article table and keep an array of author ids in there. (Losing automatic checks of referential integrity but gaining simplicity)

Any other solution? Any reason one of the two solution is much worse? Thanks!

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
kmptkp
  • 138
  • 1
  • 8
  • Does this answer your question? [Best representation of an ordered list in a database?](https://stackoverflow.com/questions/9536262/best-representation-of-an-ordered-list-in-a-database) – philipxy Mar 01 '20 at 00:59
  • [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/q/3653462/3404097) [What's the best way to store an array in a relational database?](https://stackoverflow.com/q/36678545/3404097) – philipxy Mar 01 '20 at 00:59

2 Answers2

3

The preferred option is add a author_rank column.

So you will have:

  1. an Article table containing all columns you need in this table and a unique ID for each article (article_id)

  2. an Author table containing all columns you need in this table and a unique ID for each author (author_id)

  3. Third table author_articles table connecting both tables together by article_id and author_id with an the additional column you suggested (author_rank)

This is the simplest and easiest to impliment in terms of user interface.

The query is easy to write.

To get the article authors, you supply the article_id and order by author_rank to author_articles table.

It is a good practice to keep only related information together in one table. This way if you ever want to link to yet another table, say a reviewer table: a set of people who have reviewed the article then you do not end up going back to the designing article table.

I would not go for second option at all.

You are not actually gaining any simplicity, infact you are making it more complicated. e.g. placing all author_Ids in one column: you would need to write code to put togather author_Ids and separate them when viewing. You will end up writing a peice of software rather then just using a query.

I hope this helps.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
0

I would strongly advise you to use the first solution, under almost any circumstances. I should say that with Postgres, this decision is harder than with most other databases. Postgres has strong support for arrays, and you can probably do most of what you want with the second solution.

That said, the first solution allows you to do many things that you probably want to do:

  1. Probably most importantly, you can declare explicit foreign key relationships. This allows the database to ensure data integrity.
  2. The first method makes it much simpler to answer a question such as "how many books has an author written".
  3. The first method makes it much simpler to answer a question such as "which two authors have written the most books together".
  4. The first method makes it much simpler to answer a question such as "which books share an author".

And so on. (I recognize that all except 1 can be queried using arrays; I know how to do that.)

I don't think that the second method (using an array) is "simpler". The lack of referential integrity is a big issue, as the database gets used. The need to understand "non-standard" array functionality (as opposed to standard SQL) limits the audience who can directly query the database.

All that said, there are some circumstances where arrays would be a good choice. I just doubt that this is one of those use cases.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Ad 4) "which books share *two* authors? Which books share *all* authors ? (relational division) – joop Jan 31 '17 at 12:53