MySQL Select Statement DISTINCT for Multiple Columns

Question

I am currently trying to construct a somewhat tricky MySQL Select Statement. Here is what I am trying to accomplish:

I have a table like this:

data_table

uniqueID      stringID          subject
  1             144           "My Subject"
  2             144           "My Subject - New"
  3             144           "My Subject - Newest"
  4             211           "Some other column"

Bascially, what I'd like to do is be able to SELECT/GROUP BY the stringID (picture that the stringID is threaded) and not have it duplicated. Furthermore, I'd like to SELECT the most recent stringID row, (which in the example above is uniqueID 3).

Therefore, if I were to query the database, it would return the following (with the most recent uniqueID at the top):

uniqueID   stringID    subject
 4          211        "Some other column"  
 3          144        "My Subject - Newest" //Notice this is the most recent and distinct stringID row, with the proper subject column.

I hope this makes sense. Thank you for you help.

Are you 100% certain uniqueID will always be the highest ID in the table? If not, I suggest you add a timestamp for the latest. — lexu, Jul 25 '09 at 06:37
I actually do have a timestamp column (not included in my above example). So, how would I go about using my timestamp column then? Does MAX work with a timestamp column? Thanks. — , Jul 25 '09 at 06:57

Andrew Moore · Accepted Answer · 2009-07-25T06:49:08.477

9

Try the following. It might not be the most efficient query, but it will work:

SELECT uniqueID, stringID, subject
FROM data_table
WHERE uniqueID IN
 (
  SELECT MAX(uniqueID) 
  FROM data_table
  GROUP BY stringID
 )
ORDER BY uniqueID DESC

edited Jul 25 '09 at 06:49

answered Jul 25 '09 at 06:34

Andrew Moore

93,497
30
163
175

1

This query helped the most. Also, I replaced the 'uniqueID' with lexu's suggestion above, using the timestamp. Thanks very much for your help. – Jul 25 '09 at 07:04
2

I found this searching for a solution to a similar problem. It's a good solution, but there's a performance boost to be had by using a temporary table in place of the subselect. Create a temp table based on the sub select, then where the sub select in the main query goes, put select * from temp table in its place. On my data set of 80000+ rows, the subselect method took minutes to run, while using a temp table instead took about 15 seconds. – GordonM Nov 09 '10 at 11:27

score 3 · Answer 2 · edited Aug 27 '11 at 20:52

3

SELECT DISTINCT(a),
  ( SELECT DISTINCT(b) ) AS b,
  ( SELECT DISTINCT(c) ) AS c

FROM tblMyTBL

WHERE...
Order By...
Etc.

edited Aug 27 '11 at 20:52

CoolBeans

20,654
10
86
101

answered Jan 07 '11 at 18:30

Bill Frederickson

31
1

mechanical_meat · Answer 3 · 2009-07-25T07:09:04.490

2

Edit: Based on new info provided by the OP in a comment, this would be preferable to relying on uniqueID:

select t.uniqueID
       , t.stringID
       , t.subject
       , t.your_timestamp_col
from   data_table t
       left outer join data_table t2
       on t.stringID = t2.stringID
    and
       t2.your_timestamp_col > t.your_timestamp_col
where  t2.uniqueID is null

If, as lexu mentions in a comment, you are certain that the highest uniqueID value always corresponds with the newest subject, you could do this:

select t.uniqueID
       , t.stringID
       , t.subject
from   data_table t
       left outer join data_table t2
       on t.stringID = t2.stringID
    and
       t2.uniqueID > t.uniqueID
where  t2.uniqueID is null

Which basically means: return to me only those records from data_table where there exists no higher uniqueID value.

edited Jul 25 '09 at 07:09

answered Jul 25 '09 at 06:49

mechanical_meat

163,903
24
228
223

1

It will actually perform worse. The subquery does not use any of the superqueries columns, and therefore, is computed only once. A `max` is much quicker than trying to compare each id one by one. Moreover, the join will then have to apply the `where` clause. The subquery, however, will create a hash table which serves as a lookup to each of the ID's. Ergo, only one comparison, and we don't have to check the column after all the comparisons are done. – Eric Jul 25 '09 at 06:53
@Eric - Your argument makes sense but [unfortunately MySQL doesn't currently work that way](http://stackoverflow.com/questions/3417074/why-would-an-in-condition-be-slower-than-in-sql/3417190#3417190) – Martin Smith Aug 27 '11 at 21:32

score 0 · Answer 4 · answered Aug 03 '12 at 21:23

0

I had a similar situation and found a different query. Try this:

SELECT MAX(uniqueID), stringID, subject
 FROM data_table
 GROUP BY stringID

answered Aug 03 '12 at 21:23

WebMuse

1

When providing code that solves the problem, it is best to also give at least a short explanation of how it works so that folks reading won't have to mentally parse it line by line to understand the differences. – Fluffeh Sep 28 '12 at 08:35

score -1 · Answer 5 · edited Oct 11 '12 at 11:18

private void LoadAllFamilyMembers(string relationShip)
        {
            lbFamilyMembers.SelectedIndexChanged -= new EventHandler(lbFamilyMembers_SelectedIndexChanged);
            SqlCommand cmd = new SqlCommand("select familymemberid,name from FamilyMembers where relationship = @relationship", con);
            cmd.Parameters.AddWithValue("@relationship", relationShip);
            DataTable dt = new DataTable();
            SqlDataAdapter adapter = new SqlDataAdapter(cmd);
            adapter.Fill(dt);
            lbFamilyMembers.DataSource = dt;
            lbFamilyMembers.DisplayMember = "name";
            lbFamilyMembers.ValueMember = "familymemberid";
            lbFamilyMembers.SelectedIndex = -1;
            lbFamilyMembers.SelectedIndexChanged += new EventHandler(lbFamilyMembers_SelectedIndexChanged);
        }

MySQL Select Statement DISTINCT for Multiple Columns

5 Answers5

Linked