0

I'm developing an application which has very specific/odd database requirements.

I need to store collections of strings. If it helps, one string per collection is designated as "canonical", and the others are "alternatives". The list of collections needs to be ordered, so that I can easily get e.g. the 8th string collection in the list. Finally, I need to be able to insert arbitrary amounts of collections into any point in the list.

Obviously a relational database with a simple incremental id column would be great for this except for the fact that you have to be able to insert collections into any point in the list. Then you'd have to update every single item after the inserted one, which isn't ideal.

I'm new to databases. I barely know how to use relational databases, much less anything else. Should I use SQL for this, or should I ditch relational databases for a NoSQL approach? I'm lost.

Community
  • 1
  • 1
strugee
  • 2,752
  • 4
  • 19
  • 30

2 Answers2

1

For structured ordering in a database, you'll want a column to store the ordering. An int is fine, but a decimal can make updates a little easier. Don't use the name "order" for the column, as it's a keyword, consider "display_order".

You can create a self-referencing table. A "canonical" term has a null parent id, an "alternative" does not.

create table terms (
  term_id int primary key,
  term text not null,
  parent_term_id int null references terms(term_id),
  display_order int not null,

  unique (parent_term_id, term), 
  unique (parent_term_id, display_order)
);

Add a canonical term:

inset into terms (term_id, term, parent_id, display_order) values
(1, 'United Kingdom', null, 0);

Add alternative terms, associated w the above:

inset into terms (term_id, term, parent_id, display_order) values
(2, 'Great Britain', 1, 0),
(3, 'England', 1, 1);

You can use Recursive Common Table Expressions (except in MySQL) to efficiently query this structure.

See Using a sort order column in a database table regarding updating your order column.

Community
  • 1
  • 1
Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152
0

Another form of structuring data in an RDBMS is nested sets. I've played around with them and if they meet your need, they can be vastly superior to self-referential designs.

One important consideration is that the ratio of data "movement" (inserts, deletes, updates to change the position) to queries must be low. In other words, the data should be relatively stable.

With a nested set design, just one self-join is all that is needed to traverse an entire tree to any arbitrary depth. No recursion! That alone makes it a big selling point for me. :)

Unfortunately, while there is information available on the Internet, it is all fairly basic. But if you don't mind playing around to get familiar with it, you just might be glad you did.

TommCatt
  • 5,498
  • 1
  • 13
  • 20
  • can you explain what a "nested set" and a "self-join" is? remember, I'm new to databases. thanks! – strugee Apr 19 '15 at 04:51
  • 1
    A self join is where a table is joined with itself: `select * from Table t1 join Table t2 on t2.something = t1.otherthing`. As for nested sets, a comprehensive explanation is much too long to put here. Just google "database nested set" and get a long list of good references. Here are a couple: http://www.codeproject.com/Articles/4155/Improve-hierarchy-performance-using-nested-sets and http://en.wikipedia.org/wiki/Nested_set_model. There is admittedly little in the way of queries and DML code, I had to mostly work those out myself. But if you run into issues, ask here again. – TommCatt Apr 19 '15 at 21:56
  • Adjacency List is superior to Nested Set in almost every case. NS is only useful for databases that don't support recursion. http://explainextended.com/2009/09/25/adjacency-list-vs-nested-sets-sql-server/ http://explainextended.com/2009/09/24/adjacency-list-vs-nested-sets-postgresql/ – Neil McGuigan Apr 20 '15 at 17:41
  • Well, that's your opinion. It has its pros and cons...everything does. I mentioned the big con in my answer. Recursion is "cool", but that doesn't mean it's necessary in all cases where it can be used. In fact, after my own fascination wore off, I eventually went back over years of Java code to find most of my uses of recursion and, if it couldn't be easily changed to using tail recursion, just change it to loops. Of course, SQL doesn't have looping so recursion, where available, solves a lot of problems. But it's not the only tool in the toolbox. – TommCatt Apr 20 '15 at 19:25