Normalize PostgreSQL Table with more than 700 columns

Question

I have a source (and I have no control on it) with a table like:

CREATE TABLE temp.badtable(
    Id       INTEGER
   ,Meta     TEXT
   ,Field001 FLOAT
   ,Field002 FLOAT
   ,...
   ,Field720 FLOAT
);

Where fields names are predictable and types are homogeneous. I would like to normalize this table before processing its data. I would like to have a well-shaped table instead, it should look like:

CREATE TABLE temp.goodtable(
    Id         INTEGER
   --,Meta       TEXT -- For normalization sake
   ,FieldName  TEXT
   ,FieldValue FLOAT
);

Is there a proper way to do so with PostgreSQL?

Update

I get a beggining of solution using JSON. This is not performent, but it achieves what I need:

WITH

A AS (
SELECT
    Id
   ,row_to_json(M) AS Data
FROM
    temp.badtable
),

B AS (
SELECT
    Id
   ,row_to_json(json_each(Data)) AS Pair
FROM
    A
)

SELECT
    Id
   ,Pair->>'key' AS FieldKey
   ,Pair->>'value' AS FieldValue
FROM B;

No need to store Meta so many times. Why not create another table, with the columns id, fieldname, fieldvalue. (And keep only id and meta in first table.) — jarlh, Jul 05 '17 at 08:38
@jarlh I have no control on the source table. And meta is not exactly in my concern (but yes there is no need to duplicate it a lot). Anyway this does not explain how to normalize — jlandercy, Jul 05 '17 at 08:40
What exactly do you mean with "Is there a proper way to do so"? Do you need an SQL statement that puts the data from `badtable` into `goodtable`? — , Jul 05 '17 at 08:41
@a_horse_with_no_name, yes I am looking for a SQL way to transform this table. — jlandercy, Jul 05 '17 at 08:42
@jarlh, Yes it is a kind of un-pivot, do you know how to do so? — jlandercy, Jul 05 '17 at 08:42
`insert into goodtable (id, fieldname, fieldvalue) select id, 'field001', field001 from badtable union all select id, 'field002', field002 from badtable ...` — , Jul 05 '17 at 08:44
@a_horse_with_no_name, You mean dynamic SQL (PL/PgSQL) with UNION ALL in a loop? — jlandercy, Jul 05 '17 at 08:45
That would be an option if you don't want to type all that manually. — , Jul 05 '17 at 08:46
Of potential interest: [so_post](https://stackoverflow.com/questions/13168066/transpose-rows-and-columns-a-k-a-pivot-only-with-a-minimum-count) and [postgres docs](https://www.postgresql.org/docs/9.5/static/tablefunc.html) — cole, Jul 05 '17 at 09:55
`/*create table goodtable as*/select t.id, j.k as field_name, j.v as field_value from badtable as t cross join lateral (select k, v::float from jsonb_each_text(to_jsonb(t)) as j(k,v) where k ilike 'field%' and v is not null) as j;` — Abelisto, Jul 05 '17 at 10:38
You are not normalizing. Normalization to higher NFs replaces a table by projections of it when there are problems. Eg: When *any* column is a function of a set of columns that doesn't include a PK or UNIQUE. Sometimes when a table always equals the join of two projections of it where the set of columns shared by them is unique. When a table always equals a many-table join that can't be expressed as pairwise joins. Are any of these so? Also, a table should have a PK. Anyway, you should do a proper design for what your table holds. RIght now you are moving from one bad design to another (EAV). — philipxy, Jul 07 '17 at 23:31

Laurenz Albe · Answer 1 · 2017-07-08T03:56:20.443

1

If that table is not normalized, that would mean that not all of these fields depend only on the primary key (id?).

If for example field004 and field005 depend on field003, like product name and product description depend on the product ID, you'd remove field004 and field005 from the table, create a new table that contains only these three fields with field003 as primary key and establish a foreign key relation ship from the main table to this new table.

That way you would end up with several tables, each describing certain entities, with relationships between them. You'd end up saving space and automatically guaranteed integrity because you avoid redundant data.

What you suggest above has nothing to do with normalization. It might be a possible solution if most of the fields in the table are NULL (a “sparse table”).

But in that case you might indeed be better off with a solution using JSON like Joey Pinto's answer suggests.

edited Jul 08 '17 at 03:56

answered Jul 05 '17 at 10:20

Laurenz Albe

209,280
17
206
263

Maybe the term is not exact, unpivot is better to you? – jlandercy Jul 05 '17 at 10:48
What you suggest (normalizing to 3NF) is not impossible. It winds downto 1)finding lower-cardinality columns 2) finding same-or-lower cardinality columns that depend on these. 3) the hard part is when the key in [1] is composite. – wildplasser Jul 05 '17 at 10:48
@wildplasser I don't understand what you are saying. Normalization to higher NFs has a procedure, and one should always decompose a design to 5NF then conciously denormalize. – philipxy Jul 07 '17 at 23:10
All fields always functionally depend on a PK by defintion of PK. So your first sentence really isn't at all characterizing when a table ought to be normalized. (Nothing can imply that fields don't depend on the PK, because it can't happen.) You are trying to say something like, if any column functionally depends on a set of columns that is non-unique then the table should be decomposed. (That would give BCNF.) – philipxy Jul 07 '17 at 23:20
@philipxy I have added an "only" to clarify this. You are right that everything depends on the PK. But if the table is not normalized, a certain field will also depend on something else. In the example (products) I used in my answer, you couldn't just update the product name without also updating the product ID to keep the data consistent. – Laurenz Albe Jul 08 '17 at 03:59

score 0 · Answer 2 · answered Jul 05 '17 at 08:43

0

Consider using a JSON[] in PostgreSQL.

answered Jul 05 '17 at 08:43

Joey Pinto

1,735
1
18
34

I though it could be a solution, but I am lost before those functions. Could you explain a bit more why you think it is a solution (even if it has a big overhead)? – jlandercy Jul 05 '17 at 08:44
An array of JSON objects makes no sense at all. A **single** JSON object with key/value pairs might be an option. But definitely not an array of JSON objects. – Jul 05 '17 at 08:45
Your data doesn't seem to be structured and seems to need NoSQL concepts. You are lucky enough to be using PostgreSQL as MySQL cant help with this at all. If you are dealing with mostly retrieval queries based on other parameters and indices, the overhead wont matter much. Only joins or any other operations indexed on those fields will pay the price. – Joey Pinto Jul 05 '17 at 08:48
Ofcourse you can if you dont want to store field names explicity. You can consider both options. JSON or JSON[] will do the job. – Joey Pinto Jul 05 '17 at 08:49
[{name:field1,value:value1},{name:field2,value:value2}] by using JSON array or simply {field1:value1,field2:value2}using JSON its the users choice @a_horse_with_no_name – Joey Pinto Jul 05 '17 at 08:51
@JoeyPinto I am sorry but this is not a solution, it is a more likely a hint. If you mind develop a bit how to do so properly I will do. – jlandercy Jul 12 '17 at 19:30

Normalize PostgreSQL Table with more than 700 columns

2 Answers2