Initial data (actual table contains more than 2,000,000 rows):
+--------+--------+-------+
| note | factor | label |
+--------+--------+-------+
| note_1 | 1 | 2 |
+--------+--------+-------+
| note_1 | 1 | 3 |
+--------+--------+-------+
| note_1 | 2 | 4 |
+--------+--------+-------+
| note_2 | 123 | 2 |
+--------+--------+-------+
| note_2 | 123 | 3 |
+--------+--------+-------+
| note_2 | 2 | 4 |
+--------+--------+-------+
| note_3 | 456 | 4 |
+--------+--------+-------+
| note_4 | 434 | 5 |
+--------+--------+-------+
| note_5 | 456 | 3 |
+--------+--------+-------+
| note_5 | 456 | 4 |
+--------+--------+-------+
What I want to get (further final table):
+----+-----------------+
| id | notes |
+----+-----------------+
| 1 | {note_1,note_2} |
+----+-----------------+
| 2 | {note_4} |
+----+-----------------+
| 3 | {note_3,note_5} |
+----+-----------------+
More clearly:
I need to group notes
by factor
and label
columns. Note can be in the result table only once. Result table should contains two columns: id
- row number, notes
- array of notes.
I have written a query to group by factor
and label
:
select row_number() over (order by factor) as id
, array_agg(note order by note) as notes
from test_brand
group by factor, label
It gives these results:
+---+-----------------+
| 1 | {note_1} |
+---+-----------------+
| 2 | {note_1} |
+---+-----------------+
| 3 | {note_2} |
+---+-----------------+
| 4 | {note_2} |
+---+-----------------+
| 5 | {note_1,note_2} |
+---+-----------------+
| 6 | {note_4} |
+---+-----------------+
| 7 | {note_5} |
+---+-----------------+
| 8 | {note_3,note_5} |
+---+-----------------+
But I do not know how to get the final table proceeding from here.
If we omit identifiers and return to ordinary numbers, then this task looks like a union of sets (which in fact it is).
Let's say we have 8 sets: {1}, {1}, {2}, {2}, {1,2}, {4}, {5}, {3,5}. We need to get three sets: {1,2}, {4}, {3,5}.
How it happened in my opinion:
Sets {1}, {1}, {2}, {2}, {1,2} merged into one set {1,2}, because there is intersection between {1} and {2} with {1,2}.
Sets {3,5}, {5} merged into one set {3,5}, because there is intersection between {5} and {3,5}.
Set {4} does not intersect with anyone, so it remains as it is.