0

I have this kind of email table

id | url | id1 | id2 | mail_date | message

Example of my rows

id | url | id1 | id2 | mail_date | message

1  | aaa | 2   | 8   | JAN 2016  | blah blah
2  | bbb | 4   | 8   | FEB 2016  | blah blah
3  | aaa | 8   | 2   | MAR 2016  | blah blah
4  | bbb | 8   | 4   | APR 2016  | blah blah
5  | bbb | 4   | 8   | MAY 2016  | blah blah
6  | aaa | 2   | 8   | JUN 2016  | blah blah
7  | bbb | 8   | 4   | JUL 2016  | blah blah
8  | aaa | 8   | 2   | AUG 2016  | blah blah

I have this kind of query

SELECT DISTINCT url, id1, id2 FROM email WHERE id1 = 8 OR id2 = 8 ORDER BY mail_date DESC 

My problem is that when I do this whether in phpMyAdmin or in php code I get a result like this:

url   | id1 | id2
bbb   | 8   | 4
aaa   | 8   | 2
bbb   | 4   | 8
aaa   | 2   | 8

What I want is to get results that who has a latest email sent or received by a user, like this:

url   | id1 | id2
aaa   | 8   | 2
bbb   | 8   | 4
aaa   | 2   | 8
bbb   | 4   | 8

What I want is the DISTINCT will pick the unique and the latest (based on mail_date) rows in the email table, like my want example.

url   | id1 | id2
aaa   | 8   | 2 //AUG 2016
bbb   | 8   | 4 //JUL 2016
aaa   | 2   | 8 //JUN 2016
bbb   | 4   | 8 //MAY 2016

BUT! DISTINCT will pick the first unique row of the query and it will get the older mail_date, like my problem example.

url   | id1 | id2
bbb   | 8   | 4 //APR 2016
aaa   | 8   | 2 //MAR 2016
bbb   | 4   | 8 //FEB 2016
aaa   | 2   | 8 //JAN 2016

Is there a way DISTINCT will cooperate with ORDER BY mail_date the way that I wanted to?

UPDATE

Sorry guys, I made a mistake regarding with url, I updated it now.

Dumb Question
  • 365
  • 1
  • 3
  • 14

4 Answers4

1

Distinct will only eliminate records which have the exact same data in all the columns selected. Since your URL varies for each record, distinct will not eliminate any records..

If what you're after is the most recent occurrence of a Sender/Receiver and it's URL, you can achieve this by first getting a set of data with the max date for each sender/receiver(ID1/ID2) combination. Then join this as a limit back to the base set.

SELECT E1.URL, E1.ID1, E1.ID2
FROM email E1
INNER JOIN (SELECT ID1, ID2, max(mail_date) max_mail_date
            FROM email 
            GROUP BY ID1, ID2) E2
  on E1.ID1 = E2.ID1
 and E1.ID2 = E2.ID2
 and E1.mail_date = E2.max_mail_date
WHERE E1.ID1 = 8 or E1.ID2 = 8
ORDER BY mail_date desc

An alternative to the join which would likely be faster would be to use an exists instead of the inline view. Since we want the max date we also need to use a HAVING since the aggregate value isn't available in the where clause due to order of execution.

SELECT E1.URL, E1.ID1, E1.ID2
FROM email E1
WHERE (E1.ID1 = 8 or E1.ID2 = 8)
 AND  EXISTS (SELECT max(E2.mail_date) maxDate
            FROM email E2
            WHERE E1.ID1 = E2.ID1
              and E1.ID2 = E2.ID2
            GROUP BY ID1, ID2
            HAVING E1.mail_date = maxDate)
ORDER BY E1.mail_date desc
xQbert
  • 34,733
  • 2
  • 41
  • 62
  • `E1` and `E2` are two tables right? If so, I have only 1 table. – Dumb Question Jan 05 '17 at 02:01
  • This works dude, I did quite understand your code when I actually tested it on phpmyadmin and also in php. – Dumb Question Jan 05 '17 at 04:19
  • 1
    E1 and E2 are table aliases. In the the 1st example we perform a self join on a subset of data which has been filtered to only contain the most recent email for the combination of Id1 and ID2. The second one is likely the most efficient in terms of performance. It uses the concept of a correlated subquery which also is using 2 sets of data from the same table. The sub query joins based on the ID1 and ID2 combination and then uses a having clause to limit the results to the email for each ID1, ID2 combination with the max mail_date. – xQbert Jan 05 '17 at 13:29
  • 1
    Put simply: E1 and E2 are ALIASES on the same table (dataset/recordset) and we filter each alias differently to get the desired results. In the 2nd select of each query (E2) we get just the max(mail_date) per ID1, ID2. This allows us to use these 3 pieces of data to get the desired URL from the entire set of data. (E1). In terms of performance with proper indexes on ID1, ID2 and mail_date, I know of no other higher performing query than the 2nd one above. – xQbert Jan 05 '17 at 13:33
  • Thanks, as an additional question, every time I `insert` or `update` to database I need to include the `index` on the query? – Dumb Question Jan 05 '17 at 14:27
  • 1
    No; at least not unless there is a problem with the engine making bad choices, which doesn't happen often (and you'd have to see over time if it's not using the index as expected). You should let the database engine determine which existing index to use or not use to achieve the best possible performance. You just need to make sure you have an index on email table for ID1, ID2 and date. Performance tuning is a science unto itself, so best let the engine decide what is optimal don't tweek until you know you have a problem with performance. You just need an index on ID1, ID2 and mail_date – xQbert Jan 05 '17 at 14:29
  • Thanks, I am still studying of index as of now, thank you again. – Dumb Question Jan 05 '17 at 14:32
  • Good day, just to make sure, all I need to do is click `Index` in the column that I want to be indexed in phpMyAdmin right? – Dumb Question Jan 07 '17 at 02:06
1

Here is my attempt:

http://sqlfiddle.com/#!9/32c11f/4

SELECT e.url, e.id1, e.id2 
FROM email e
LEFT JOIN email e1
ON e.url = e1.url
  AND e.id1 = e1.id1
  AND e.id2 = e.id2 
  AND e.mail_date > e1.mail_date
WHERE e.id1 = 8 OR e.id2 = 8 
  AND e1.url IS NULL
ORDER BY e.mail_date DESC;
Alex
  • 16,739
  • 1
  • 28
  • 51
0

Updated:

MySQL version 5.7.14

select l.* 
from email l
inner join (
  select 
    id1, id2, max(mail_date) as latest 
  from email
    where id1 = 8 OR id2 = 8
  group by id1, id2
) r
  on l.mail_date = r.latest and l.id1 = r.id1 and l.id2 = r.id2
order by mail_date desc

If you want more detail you can check this link: Using ORDER BY and GROUP BY together

Hope this help you.

Community
  • 1
  • 1
gobliggg
  • 237
  • 2
  • 13
0

You can make use of a sub-query. First get the results ordered by date and then group them.

SELECT * FROM ( SELECT url, id1, id2 FROM email WHERE id1 = 8 OR id2 = 8 ORDER BY mail_date DESC ) tmp GROUP BY tmp.url

UPDATE:

@xQbert What I have understood now is that the required result set should contain a sender-recipient email communication done on the most recent date.

select t1.url, t1.id1, t1.id2, t1.mail_date 
from email t1 
inner join (
    select t2.*, max(t2.mail_date) last_date
    from email t2 
    group by t2.id1, t2.id2 
    order by last_date desc
) t2 on t1.id1=t2.id1 and t1.id2=t2.id2 and t1.mail_date=t2.last_date
where t1.id1=8 or t1.id2=8
codeit
  • 111
  • 8
  • All the URLS are different in the test data set. So this will return all 8 records, when I think the post is only asking for 4 out of the 8 in test data set. (the last 4 based on date I believe) – xQbert Jan 04 '17 at 15:38
  • I don't quite understand this, I searched about `JOINS` before and it appears that this can be used for joining two tables. But I only have one table in my example.. – Dumb Question Jan 05 '17 at 03:53
  • This join type is known as SELF JOIN, joining a table to itself. It is used when we need to join a result set to another from the same table. The tables are distinguished by giving an alias. Here, we first get the rows grouped by IDs and ordered by the recent date (t2) and then join it to the base table (t1). – codeit Jan 05 '17 at 05:18