For fixed three columns this could be a possible solution.
http://sqlfiddle.com/#!17/45dc7/137
Disclaimer: This will not work if there could be same values in different columns. E.g. One row (42, NULL, "A42", NULL)
and one row (23, "A42", NULL, NULL)
will end in unwanted results. The fix for that is to concatenate a column identifier with an unique delimiter to the string and remove it after the operation by string split.
WITH test_table as (
SELECT *,
array_remove(ARRAY[column1,column2,column3], null) as arr, -- A
cardinality(array_remove(ARRAY[column1,column2,column3], null))as arr_len
FROM test_table )
SELECT
s.array_agg as aggregates, -- G
MAX(tt.column1) as column1,
MAX(tt.column2) as column2,
MAX(tt.column3) as column3
FROM (
SELECT array_agg(id) FROM -- E
(SELECT DISTINCT ON (t1.id)
t1.id, CASE WHEN t1.arr_len >= t2.arr_len THEN t1.arr ELSE t2.arr END as arr -- C
FROM
test_table as t1
JOIN -- B
test_table as t2
ON t1.arr @> t2.arr AND COALESCE(t2.column1, t2.column2, t2.column3) IS NOT NULL
OR t2.arr @> t1.arr AND COALESCE(t1.column1, t1.column2, t1.column3) IS NOT NULL
ORDER BY t1.id, GREATEST(t1.arr_len, t2.arr_len) DESC -- D
) s
GROUP BY arr
UNION
SELECT
ARRAY[id]
FROM test_table tt
WHERE COALESCE(tt.column1, tt.column2, tt.column3) IS NULL) s -- F
JOIN test_table tt ON tt.id = ANY (s.array_agg)
GROUP BY s.array_agg
A: Aggregate the column values and removing the NULL
values. The reason is that I check for subsets later which will not work with NULL
s. This is the point where you should add the column identifier as mentioned in the disclaimer above.
B: CROSS JOIN
the table against itself. Here I am checking if one column aggregate is a subset of another. The rows with only NULL
values are ignored (this is the COALESCE
function)
C: Getting the column array with the highest length either from the first or from the second table. It depends on its id.
D: With the ORDER BY
the longest array and the DISTINCT
it is assured that only the longest array is given for each id
E: Now there are many ids with the same column array sets. The array sets are used to aggregate the ids. Here the ids are put together.
F: Add all NULL
rows.
G: One last JOIN
against all columns. The rows are taken that are part of the id aggregation from (E). After that the MAX
value is grouped per column.
Edit: Fiddle for PostgreSQL 9.3 (array_length
instead of cardinality
function) and added test case (8, 'A2', 'A3', 'A8')
http://sqlfiddle.com/#!15/8800d/2