I have 2 tables that can be simplified to this structure:
Table 1:
+----+----------+---------------------+-------+
| id | descr_id | date | value |
+----+----------+---------------------+-------+
| 1 | 1 | 2013-09-20 16:39:06 | 1 |
+----+----------+---------------------+-------+
| 2 | 2 | 2013-09-20 16:44:06 | 1 |
+----+----------+---------------------+-------+
| 3 | 3 | 2013-09-20 16:49:06 | 5 |
+----+----------+---------------------+-------+
| 4 | 4 | 2013-09-20 16:44:06 | 894 |
+----+----------+---------------------+-------+
Table 2:
+----------+-------------+
| descr_id | description |
+----------+-------------+
| 1 | abc |
+----------+-------------+
| 2 | abc |
+----------+-------------+
| 3 | abc |
+----------+-------------+
| 4 | DEF |
+----------+-------------+
I want to join the description into table1, filter by description so I only get rows where description=abc, and filter out "duplicate" rows, where two rows are duplicates if they have the same value and their dates are within 6 minutes of one another. My desired output table is below, (assuming abc is the desired description filter).
+----+----------+---------------------+-------+-------------+
| id | descr_id | date | value | description |
+----+----------+---------------------+-------+-------------+
| 1 | 1 | 2013-09-20 16:39:06 | 1 | abc |
+----+----------+---------------------+-------+-------------+
| 3 | 3 | 2013-09-20 16:49:06 | 5 | abc |
+----+----------+---------------------+-------+-------------+
The query I came up with is:
select *
from (
select *
from table1
join table2 using(descr_id)
where label='abc'
) t1
left join (
select *
from table1
join table2 using(descr_id)
where label='abc'
) t2 on( t1.date<t2.date and t1.date + interval 6 minute > t2.date)
where t1.value=t2.value.
Unfortunately this query takes over a minute to run with my dataset, and returns no results (although I believe there should be results). Is there a more efficient way to perform this query? Is there a way to name a derived table and reference it later in the same query? Also, why does my query return no results?
Thanks in advance for help!
edit: I would like to retain the first of several samples with close-together timestamps.
My table1 has 6.1 million rows, my table2 has 30K, which makes me realize that table2 would only have one row for description "abc". This means that I can just query for descr_id beforehand, then use that id to avoid joining table2 in the big query at all, making it much more efficient. However, if my table2 was setup as outlined above (which would be poor database design, I admit) what is a good way to perform such a query?