3

I have a MySQL table that holds many entries with repeated IDs (for various reasons) So you might have something like

ID TIME DATA
1  xx   xx
2  xx   xx
3  xx   xx
1  xx   xx
3  xx   xx

What query can I run through PHP to select each ID only once? So I would like my result set to look like

ID TIME DATA
1  xx   xx
2  xx   xx
3  xx   xx
VolkerK
  • 95,432
  • 20
  • 163
  • 226
  • 2
    What would TIME and DATA be in the result? The values of the first record having a specific id (then the question is why)? Or the result of some aggregate function (Count(), Sum() maybe even Group_Concat())? or... what? – VolkerK Jun 23 '09 at 19:10
  • DISTINCT does not solve your problem. – Sinan Ünür Jun 23 '09 at 19:49

6 Answers6

14

The suggestion given by @karim79 and @Matthew Jones, that DISTINCT(ID) can fix this problem, is a common misconception about the way DISTINCT works in SQL. It doesn't help, because DISTINCT always applies to the whole row, not a single column. The parentheses are irrelevant, as they would be in the queries SELECT (1) versus SELECT 1.

The answer given by @T Pops is actually helpful in this case, because MySQL handles GROUP BY in a nonstandard way. See my answer to "Selecting a Column not in GROUP BY" for an explanation.

Another solution is to use LEFT OUTER JOIN creatively to query for the first row per ID. For instance, assuming that the TIME column is unique for a given ID, you could do the following:

SELECT t1.*
FROM MyTable t1 LEFT OUTER JOIN MyTable t2
  ON (t1.ID = t2.ID AND t1.TIME > t2.TIME)
WHERE t2.ID IS NULL;

If t1 is pointing to the earliest row in the group, then there can be no matching row in t2 with an earlier date. Therefore, the LEFT OUTER JOIN will find no match, and leave t2.* as NULL.

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Excellent explanation. I'll come back and vote you up in about eight hours ;-) – Sinan Ünür Jun 23 '09 at 19:48
  • @Sinan: Is that a comment that JOIN would have poor performance? You should try using EXPLAIN to view the optimization plan of a query like this. Then compare it to the plan for a GROUP BY or DISTINCT query, which usually have to use a temp table. – Bill Karwin Jun 23 '09 at 19:54
  • @Bill I ran out of votes for today, so I could not upvote your answer. It is important to make the point that DISTINCT is not an appropriate "solution" here. And the rest of your explanation is very clear. There was no sarcasm in my response, just genuine regret that I could not upvote you right at that moment. – Sinan Ünür Jun 23 '09 at 20:55
  • Thanks! LOL -- I thought you were being snarky, as if you expected the query to take eight hours to finish, and you were waiting for that before casting a vote. :-) – Bill Karwin Jun 23 '09 at 21:08
  • this question is not worth a +1 until this is answer is the official solution. – shigeta Jan 26 '13 at 17:20
  • This is awesome and very clever. Thank you so much! – Paul Jun 22 '17 at 14:51
5

It sounds like you're looking for GROUP BY.

You can use a query similar to:

SELECT id, time, data GROUP BY ID

You'll definitely have to do some tweaking to get that query to work with the structure you have but that's the basic idea.

Tizag has a great reference for using GROUP BY with MySQL and PHP.

Travis
  • 12,001
  • 8
  • 39
  • 52
  • +1 This would actually work in MySQL, because MySQL handles GROUP BY in a nonstandard way. – Bill Karwin Jun 23 '09 at 19:12
  • Is it actually guaranteed to return time + data from the same row? – Andomar Jun 23 '09 at 19:18
  • No, it makes no guarantee about which row it takes the values from. In practice, it takes them from the row that is stored physically first. If that's what the OP wants, then he can use that (assuming MySQL doesn't change behavior in a future version). – Bill Karwin Jun 23 '09 at 19:25
  • See http://stackoverflow.com/questions/1023347/mysql-selecting-a-column-not-in-group-by/1023435#1023435 – Bill Karwin Jun 23 '09 at 19:28
  • But once it picks a row, it will take all columns from that row? It wouldn't take time from row 1 and data from row 4? – Andomar Jun 23 '09 at 19:37
  • Andomar, it should take whichever row was physically first in the DB. I believe rows are stored with all the columns together so there's no way for row 1's DATA column to come before row 2's DATA column. I think the asker's question is too vague for a definitive answer but this is as close as I could come. – Travis Jun 23 '09 at 20:15
  • The tricky part is when you have aggregate functions like MAX() in your query. The non-aggregated columns still take their values from the first row in the group, even if that isn't the row with the MAX() value. – Bill Karwin Jun 23 '09 at 21:11
  • Anyway, it's moot because the OP's actual problem is totally different from what he asked. >:-[ – Bill Karwin Jun 23 '09 at 21:12
4

Take a ganders at SELECT DISTINCT

Aiden Bell
  • 28,212
  • 4
  • 75
  • 119
  • You guys are awesome! DISTINCT works perfectly. Group By is also good, but I think I need distinct in this case. Thanks to everyone who replied! –  Jun 23 '09 at 19:18
  • Didn't see that green coming :) Nice. – Aiden Bell Jun 23 '09 at 19:20
  • 2
    Heh, I'd be interested in seeing the actual DISTINCT query that solved this problem!:) – Andomar Jun 23 '09 at 19:21
  • There isn't much too it Andomar... $sql = "SELECT DISTINCT post_id FROM vote_records WHERE title LIKE '".$query1."' OR title LIKE '".$query2."' OR title LIKE '".$query3."' OR title LIKE '".$query4."' OR title LIKE '".$query5."'"; –  Jun 23 '09 at 19:52
  • 1
    @Rudebuyrock ... Would a JOIN on the distinct ids not be better? – Aiden Bell Jun 23 '09 at 20:59
  • 1
    Aaah so you're just looking for the id, not the other columns. That explains it :) – Andomar Jun 23 '09 at 21:14
  • 1
    But he still has the TIME and DATA columns in the result in his question. – Sinan Ünür Jun 23 '09 at 21:18
0

Add a real primary key to the table, maybe called RealId.

Then you can:

select *
from YourTable
where realid in (
    select min(realid)
    from YourTable
    group by id
)

Which would give you one row per id.

Andomar
  • 232,371
  • 49
  • 380
  • 404
0
SELECT * FROM MyTable GROUP BY ID ORDER BY ID ASC
user3668456
  • 141
  • 1
  • 9
-1

The SQL Query would look something like this:

SELECT DISTINCT (ID), TIME, DATA FROM TableName
Matthew Jones
  • 25,644
  • 17
  • 102
  • 155
  • 1
    That would select both records (1,'2009-05-16','x'),(1,'2009-04-11','y') because they differ in TIME and DATA. – VolkerK Jun 23 '09 at 19:02
  • I KNEW i forgot something. Thanks! – Matthew Jones Jun 23 '09 at 19:03
  • Which is why I don't program in MySQL; it's too permissive. Oh well. – Matthew Jones Jun 23 '09 at 19:11
  • @Matthew Jones: This has nothing to do with MySQL. DISTINCT works the same in standard SQL as it does in MySQL, or any other brand of database. DISTINCT *always* applies to the whole row, not to a single column, even if you use parentheses. – Bill Karwin Jun 23 '09 at 19:22