I have a Postgres 9.1.4 table with a few million rows. A very small version could be this:
master
index location
----------------
1 A
2 C
3 B
4 C
5 C
6 A
I need to do an expensive calculation on each distinct value in the location field. I don't want to use this field from the master table because I would be repeatedly processing the same location. I want a table with distinct values of location, and the result of the calculation will be stored in a calculation field:
distinct
index location calculation
------------------------------
1 A' X
2 C' Y
3 B' Z
The relationship between master and distinct is not easily determined after distinct is populated. I have to do a little data manipulation to make the locations work in the calculation. I really need a third table, created roughly at the same time I populate distinct, to help me correlate each entry in distinct back to its parent in master.
The third table might look like this:
correlation
master_index distinct_index
------------------------------
1 1
2 3
3 2
4 3
5 3
6 1
The problem is I don't see how this would be done with any straightforward SQL. I could use something like this as the beginning of a query to populate distinct:
SELECT location, array_agg(index)
FROM master
GROUP BY location;
The problem is I would need another array column in distinct to hold these values, then I'd later need to use some other program to parse the arrays and construct the correlation table.
Am I missing a simpler way of doing this?