Tricky PostgreSQL join and order query

Question

I've got four tables in a PostgreSQL 9.3.6 database:

sections
fields (child of sections)
entries (child of sections)
data (child of entries)

CREATE TABLE section (
  id serial PRIMARY KEY,
  title text,
  "group" integer
);

CREATE TABLE fields (
  id serial PRIMARY KEY,
  title text,
  section integer,
  type text,
  "default" json
);

CREATE TABLE entries (
  id serial PRIMARY KEY,
  section integer
);

CREATE TABLE data (
  id serial PRIMARY KEY,
  data json,
  field integer,
  entry integer
);

I'm trying to generate a page that looks like this:

section title

          field 1 title  | field 2 title | field 3 title 
entry 1 | data 'as' json | data 1 json   | data 3 json        <-- table
entry 2 | data 'df' json | data 5 json   | data 6 json  
entry 3 | data 'gh' json | data 8 json   | data 9 json

The way I have it set up right now each piece of 'data' has an entry it's linked to, a corresponding field (that field has columns that determine how the data's json field should be interpreted), a json field to store different types of data, and an id (1-9 here in the table).

In this example there are 3 entries, and 3 fields and there is a data piece for each of the cells in between.

It's set up like this because one section can have different field types and quantity than another section and therefore different quantities and types of data.

Challenge 1:

I'm trying to join the table together in a way that it's sortable by any of the columns (contents of the data for that field's json column). For example I want to be able to sort field 3 (the third column) in reverse order, the table would look like this:

section title

          field 1 title  | field 2 title | field 3 title 
entry 3 | data 'gh' json | data 8 json   | data 9 json  
entry 2 | data 'df' json | data 5 json   | data 6 json  
entry 1 | data 'as' json | data 1 json   | data 3 json        <-- table

I'm open to doing it another way too if there's a better one.

Challenge 2:

Each field has a 'default value' column - Ideally I only have to create 'data' entries when they have a value that isn't that default value. So the table might actually look like this if field 2's default value was 'asdf':

section title

          field 1 title  | field 2 title | field 3 title 
entry 3 | data 'gh' json | data 8 json   | data 9 json  
entry 2 | data 'df' json | 'asdf'        | data 6 json  
entry 1 | data 'as' json | 'asdf'        | data 3 json        <-- table

I was hoping for some table and column names in order to construct a proper SQL query. — Tim Biegeleisen, Apr 02 '15 at 05:28
And your version of Postgres should always be in the question. — Erwin Brandstetter, Apr 02 '15 at 06:14

score 2 · Answer 1 · answered Apr 02 '15 at 06:42

The key to writing this query is understanding that you just need to fetch all the data for single section and the rest you just join. You also can't with your schema directly filter data by section so you'll need to join entry just for that:

SELECT d.* FROM data d JOIN entries e ON (d.entry = e.id) 
WHERE e.section = ?

You can then join field to each row to get defaults, types and titles:

SELECT d.*, f.title, f.type, f."default" 
FROM data d JOIN entries e ON (d.entry = e.id) 
            JOIN fields f ON (d.field = f.id)
WHERE e.section = ?

Or you can select fields in a separate query to save some network traffic.

So this was an answer, here come bonuses:

Use foreign keys instead of integers to refer to other tables, it will make database check consistency for you.
Relations (tables) should be called in singular by convention, so it's section, entry and field.
Referring fields are called <name>_id, e.g. field_id or section_id also by convention.
The whole point of JSON fields is to store a collection with not statically defined data, so it would made much more sense to not use entries and data tables, but single table with JSON containing all the fields instead.

Like this:

CREATE TABLE row ( -- less generic name would be even better
    id int primary key,
    section_id int references section (id),
    data json
)

With data fields containing something like:

{
    "title": "iPhone 6",
    "price": 650,
    "available": true,
    ...
}

I will do 1-3. Would you be able to sort by say 'title' in any way with the setup you have in 4? Also that query you provided looks very similar to what I'm looking for. I would probably do the field select in a separate query though. Is there any way to sort the results by data with one of those queries? If so I have no idea how - I just need a general direction — Max Hudson, Apr 02 '15 at 06:48
In a PostgreSQL 9.3 you can access values in JSON fields, so yes you will be able sort by title. You can also create a functional index to make it faster. And you can upgrade to PostgreSQL 9.4 and get binary JSON and GIN indexes for that. — Suor, Apr 02 '15 at 07:04

score 1 · Accepted Answer · edited May 23 '17 at 11:54

@Suor has provided good advice, some of which you already accepted. I am building on the updated schema.

Schema

CREATE TABLE section (
  section_id serial PRIMARY KEY,
  title text,
  grp integer
);

CREATE TABLE field (
  field_id serial PRIMARY KEY,
  section_id integer REFERENCES section,
  title text,
  type text,
  default_val json
);

CREATE TABLE entry (
  entry_id serial PRIMARY KEY,
  section_id integer REFERENCES section
);

CREATE TABLE data (
  data_id serial PRIMARY KEY,
  field_id integer REFERENCES field,
  entry_id integer REFERENCES entry,
  data json
);

I changed two more details:

section_id instead of id, etc. "id" as column name is an anti-pattern that's gotten popular since a couple of ORMs use it. Don't. Descriptive names are much better. Identical names for identical content is a helpful guideline. It also allows to use the shortcut USING in join clauses:
Don't use reserved words as identifiers. Use legal, lower-case, unquoted names exclusively to make your life easier.
- Are PostgreSQL column names case-sensitive?

Referential integrity?

There is another inherent weakness in your design. What stops entries in data from referencing a field and an entry that don't go together? Closely related question on dba.SE

Enforcing constraints “two tables away”

Query

Not sure if you need the complex design at all. But to answer the question, this is the base query:

SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
FROM   entry     e
JOIN   field     f USING (section_id)
LEFT   JOIN data d USING (field_id, entry_id)  -- can be missing
WHERE  e.section_id = 1
ORDER  BY 1, 2;

The LEFT JOIN is crucial to allow for missing data entries and use the default instead.

SQL Fiddle.

`crosstab()`

The final step is cross tabulation. Cannot show this in SQL Fiddle since the additional module tablefunc is not installed.

Basics for crosstab():

PostgreSQL Crosstab Query

SELECT * FROM crosstab(
   $$
   SELECT entry_id, field_id, COALESCE(d.data, f.default_val) AS data
   FROM   entry     e
   JOIN   field     f USING (section_id)
   LEFT   JOIN data d USING (field_id, entry_id)  -- can be missing
   WHERE  e.section_id = 1
   ORDER  BY 1, 2
   $$
  ,$$SELECT field_id FROM field WHERE section_id = 1 ORDER BY field_id$$
   ) AS ct (entry int, f1 json, f2 json, f3 json)  -- static
ORDER  BY f3->>'a';  -- static

The tricky part here is the return type of the function. I provided a static type for 3 fields, but you really want that dynamic. Also, I am referencing a field in the json type that may or may not be there ... So build that query dynamically and execute it in a second call.

More about that:

Dynamic alternative to pivot with CASE and GROUP BY

If you need to write queries like that you probably designed your schema bad. — Suor, Apr 02 '15 at 07:55
@Suor: That may well be. There is another problem with referential integrity. I added a bit. — Erwin Brandstetter, Apr 02 '15 at 08:26
If I eliminate the data table and add a data json column to the entry table, I think that would eliminate the need for join at all - I could select the fields. Then select the entries and sort by whichever key in the json field I wanted. Then when I create the table, if the entry json key for that column/field is set I use that value, otherwise I use the default. The only tricky part is efficiently updating the data json field it seems then assuming sorting by a json key works well in 9.3 — Max Hudson, Apr 02 '15 at 14:32
Is the crosstab step even necessary here? It seems to be working fine in the fiddle without it? What does the crosstab part do? — Max Hudson, Apr 02 '15 at 18:19
@maxhud: It converts values of the column `field_id` into separate columns to match the format in your question. Follow [the link I provided](http://stackoverflow.com/questions/3002499/postgresql-crosstab-query/11751905#11751905) for details. — Erwin Brandstetter, Apr 02 '15 at 18:24
The crosstab query returns a table that looks like this: 1;"";"";"" 2;"";"";"" 3;"";"";"" but I do see why I need the crosstab type query. The three f json columns aren't working — Max Hudson, Apr 02 '15 at 18:41
@maxhud: Sorry, I had a bug: We need the matching `field.field_id`, not `field.title`. — Erwin Brandstetter, Apr 02 '15 at 18:46
You really are a database expert. Thank you so much! You've been very helpful. — Max Hudson, Apr 02 '15 at 18:47
Is there a way I can store "String" in the json column instead of {"key": "value"} and sort by the column not the key within the column? — Max Hudson, Apr 02 '15 at 19:50
@maxhud: Please start a new question. Comments are not the place. You can always link to this one for context. — Erwin Brandstetter, Apr 02 '15 at 20:05

Tricky PostgreSQL join and order query

2 Answers2

Schema

Referential integrity?

Query

crosstab()

`crosstab()`