I'm not very familiar with SQL and have hit a problem with duplicate-removal that I need some help with.
I have two tables as follows:
config -> ID -----------
-> timestamp |
-> vcsID |
-> platform |
|
data -> ID |
-> configID <-----
-> value
For each row in the config table there may be multiple rows in the data table, the join being:
data.configID = config.ID
However, in the config table there may be entries in which vcsID has not advanced (these represent builds on the same code from our version control system, and 'value' represents a build metric such as memory size); they are not duplicate rows, because the timestamp will be different, but I need to get rid of them. I want to create a view on the data that removes these duplicates. How do I go about doing that? In MySQL I can do the following:
select *
from (select * from config group by vcsID) as filtered
inner join data
on data.configID = filtered.ID
...but Postgres is strictly compliant with the SQL standard and requires that anything in the Select clause must be in the Group By clause so I can't get the ID field out of the sub-query to do the join.
As an example, some sample data might be:
config
ID timestamp vcsID platform
1 1/1/2014 09:00 18 "x"
2 1/1/2014 20:20 30 "y"
3 1/1/2014 20:25 30 "y"
4 1/1/2014 20:40 31 "y"
data
ID configID value
12 1 40000
13 2 125
14 3 125
15 4 130
...and, taking platform "y" as the thing I'm interested in, what I'd like to get out is:
config | data
ID timestamp vcsID platform | ID configID value
2 1/1/2014 20:20 30 "y" | 13 2 125
4 1/1/2014 20:40 31 "y" | 15 4 125
or, equivalently:
config | data
ID timestamp vcsID platform | ID configID value
3 1/1/2014 20:25 30 "y" | 14 3 125
4 1/1/2014 20:40 31 "y" | 15 4 125
Any help is greatly appreciated.