This table is the (fake example) result of a big messy query.
=> WITH data(run, host, status, field_ip, control_ip, field_data, control_data) AS (
VALUES (1, 1, 'no_control', '10.0.0.1', NULL, 'foo', NULL),
(1, 2, 'good', '10.0.0.1', '10.0.0.1', 'bar', 'bar'),
(1, 3, 'problem_1', '10.0.0.1', NULL, 'bar', NULL),
(1, 3, 'problem_2', '10.0.0.2', NULL, 'baz', NULL),
(1, 3, NULL, NULL, '192.168.1.1', NULL, 'wallace'))
SELECT * FROM data;
run | host | status | field_ip | control_ip | field_data | control_data
-----+------+------------+----------+-------------+------------+--------------
1 | 1 | no_control | 10.0.0.1 | | foo |
1 | 2 | good | 10.0.0.1 | 10.0.0.1 | bar | bar
1 | 3 | problem_1 | 10.0.0.1 | | bar |
1 | 3 | problem_2 | 10.0.0.2 | | baz |
1 | 3 | | | 192.168.1.1 | | wallace
I want to merge the "control_ip" and "control_data" fields from rows where they are NOT NULL onto rows where they are NULL, according to the following rules: Consider each group of rows with the same 'run' and 'host' values independently. Within each such group:
- Each row where all the fields are NOT NULL is to be output unchanged and removed from the group. (For example, the row (1,2) is to be output unchanged. It's the only row in its group, so that group is now done.)
- For each row where the 'field_ip' and 'field_data' fields are NOT NULL, but the 'control_ip' and 'control_data' fields are NULL, find another row in the group where the opposite is true, and copy its 'control_ip' and 'control_data' fields onto the first row. Then output the modified row and remove it from the group.
- If 'another row where the opposite is true' does not exist, output the first row unchanged (for example, row (1,1) is to be output unchanged).
- If there is more than one 'another row where the opposite is true', pick one arbitrarily, and discard it after copying its values. (No example of this in the sample data set.)
- If there is only one 'another row where the opposite is true', do not discard it after copying its values. (For example, row (1,3,NULL,NULL,'192.168.1.1',NULL,'wallace') should have its control_ip/control_Data values copied onto both the (1,3,problem_1,10.0.0.1) and (1,3,problem_2,10.0.0.2) rows.)
- Repeat step 2 until the group has no more rows where the 'field_ip' and 'field_data' fields are NOT NULL.
- Discard all remaining rows in the group (which will all have NULL field_ip and field_data).
The desired output for the above sample data is:
run | host | status | field_ip | control_ip | field_data | control_data
-----+------+------------+----------+-------------+------------+--------------
1 | 1 | no_control | 10.0.0.1 | | foo |
1 | 2 | good | 10.0.0.1 | 10.0.0.1 | bar | bar
1 | 3 | problem_1 | 10.0.0.1 | 192.168.1.1 | bar | wallace
1 | 3 | problem_2 | 10.0.0.2 | 192.168.1.1 | baz | wallace
This is a similar problem to How to concatenate text from multiple rows into a single text string in SQL server?, but that answer won't work for me because GROUP BY run, host
plus aggregates can only emit a single row per group, whereas I need in some cases to emit multiple rows per group. I tried some things involving GROUP BY run, host, field_ip
but then the third (1,3) row gets treated as its own group, which is no good. Also, I am using PostgreSQL (12), not SQL Server, and AIUI PIVOT is a sql-server-ism.
Note: the last step of the big messy query, producing the table shown, is the result of a FULL JOIN on run, host, and field_ip = control_ip. If it would be easier to do what I am asking for by starting from the precursors to the FULL JOIN, please tell me.
What do you suggest?