Selecting Unique Values Not in the Substring

Question

I can't wrap my head around this for some reason. I'm trying to get the ID from a table where multiple columns are distinct... basically something like -

Select ID from table where ID in (Select distinct ID, Card, PunchTime, PunchDate)

Though that obviously doesn't work. I want to get the IDs that are unique with all of those fields as the criteria. I can't seem to come up with syntax that works. I'm not sure what else I can say about it, it seems lke it should be simple when I look at it... but I've been bouncing off of it since yesterday and nothing is working. Anyone know which way I should be going? Thank you in advance!

edit: The posted things work but the results weren't what I expected. Here's some saple data:

ID  Card  PunchDate  PunchTime In/Out
================================
1  00123  3/17/2012  13:00  1
2  00123  3/17/2012  17:00  2
3  00123  3/17/2012  17:00  1
4  00123  3/17/2012  20:00  2
5  00456  3/17/2012  14:00  1
6  00456  3/17/2012  17:00  2

The reason I'm trying to do this is that the timekeeping software decides that anything with a card, punchdate, and punchtime that is identical to another is a duplicate, regardless of whether it is an in or out punch and deletes one. My only solution is to eliminate the duplicates and basically make the punches from the first in punch to the last outpunch where they are duplicated. So my goal is to select only the unique values based on card, punchdate, and punchtime. However what I have is not excluding the ID in the matter which is making it a unique value. I have a workaround for this so time isn't particularly an issue, but I would much rather figure out how I can get the right data.

Thank you all again for the fast replies!

So you want to find out where a person punched in or out without having an opposite event for that person on that day? (punched in w/o punching out, punched out w/o punching in) — SQLMason, Mar 29 '12 at 14:38
If you have a punch in and a punch out at the same time, how do you want to decide which to keep, the in, or the out? — MatBailie, Mar 29 '12 at 14:43
No I'm just trying to find the punches that are duplicates amongst the values of card, punchdate, and punchtime. So if someone has a punchout for 17:00 and a punch in for 17:00 on the same day, then it doesn't select those. For 00123 it would just go from 13:00 to 20:00 because the 17:00 punches would be eliminated. This would give the same total hours but eliminate the duplicate. Edit: I want to eliminate both and simply go from the first punch to the fourth punch. — RobLife, Mar 29 '12 at 14:44
What RDBMS? Your problem is actually not what you stated previously, and the answer will depend on your RDBMS.......Also, are you supposed to just ignore the duplicates altogether? What if you get an in, (out,in,out--duplicate) and then no other out? What are your business rules for these situations? Basically, what do you do if they punch in at 1300, then out-in-out at 2000. You basically want the first in of the day and the last out of the day? — Justin Pihony, Mar 29 '12 at 14:45
Really then you want the MIN PunchTime where the In/Out is 1 and the MAX Punchtime where the In/Out is 2. — SQLMason, Mar 29 '12 at 14:46
Do you want to SELECT the non-duplicates, or DELETE the duplicates? I've posted the SELECT for you, but it's easy enough to turn into a DELETE. — MatBailie, Mar 29 '12 at 14:47
@DanAndrews - He wants to select the non-duplicates, not select the duplicates. Also, what if there are two punch in's at the same time? — MatBailie, Mar 29 '12 at 14:48
To sum up all of the questions, you need to post the full requirements, especially how to deal with the edge cases (double IN's, ODD number duplicate punches, ???) — Justin Pihony, Mar 29 '12 at 14:50
I'm using SQL Server Management Studio. Yes I'm trying to select the non-duplicates an import those into our time tracking software. There isn't really an issue with duplicate times, that is all handled by my front end. Really the only thing that comes through is the fact that the staff can enter these punches with a stop time that happens at the same time as another start time. It's allowed by the business rules but is incompatible with the time tracking software. — RobLife, Mar 29 '12 at 14:51
@Endyo - That's exactly what my answer does. Select the non-duplicates - Filtering out any records where another record exists with matching (card,punchdate,punchtime). — MatBailie, Mar 29 '12 at 15:01
@Dems, that's what mine did too :) However I re-answered the question after this new information. — SQLMason, Mar 29 '12 at 15:02

Justin Pihony · Accepted Answer · 2012-03-29T15:29:13.370

2

UPDATED ANSWER WITH NEW INFORMATION:

SELECT *
FROM TABLE
WHERE NOT EXISTS
(
    SELECT 1 
    FROM TABLE AS Duplicates
    WHERE Duplicates.Card = TABLE.Card
        AND Duplicates.PunchDate = TABLE.PunchDate
        AND Duplicates.PunchTime = TABLE.PunchTime
        AND Duplicates.ID != TABLE.ID
)

Basically, this is saying, get all of the records that do not have the same card, punchdate, punchtime (making sure to not count the same row against itself.)

edited Mar 29 '12 at 15:29

answered Mar 29 '12 at 14:10

Justin Pihony

66,056
18
147
180

@Endyo I have updated my answer to fit with what should work for your new requirements. – Justin Pihony Mar 29 '12 at 15:29
Wonderful, simple and effect. Thank you. – RobLife Mar 29 '12 at 18:37

SQLMason · Answer 2 · 2012-03-29T17:45:35.273

With the assumption that there is no 2nd shift which starts on one day and ends on the next ...

Table:

DECLARE @table TABLE
    (
      [ID] INT IDENTITY,
      [Card] INT,
      [PunchDate] DATETIME,
      [PunchTime] DATETIME,
      [In/Out] TINYINT
    )
INSERT  INTO @table
        (
          [Card],
          [PunchDate],
          [PunchTime],
          [In/Out]
        )
        SELECT  00123,
                '3/17/2012',
                '3/17/2012 13:00',
                1
        UNION ALL
        SELECT  00123,
                '3/17/2012',
                '3/17/2012 17:00',
                2
        UNION ALL
        SELECT  00123,
                '3/17/2012',
                '3/17/2012 17:00',
                1
        UNION ALL
        SELECT  00123,
                '3/17/2012',
                '3/17/2012 20:00',
                2
        UNION ALL
        SELECT  00456,
                '3/17/2012',
                '3/17/2012 14:00',
                1
        UNION ALL
        SELECT  00456,
                '3/17/2012',
                '3/17/2012 17:00',
                2

Query:

SELECT  [Card],
        [PunchDate],
        MIN([PunchTime]) [PunchTime],
        [In/Out]
FROM    @table
WHERE   [In/Out] = 1
GROUP BY [Card],
        [PunchDate],
        [In/Out]
UNION
SELECT  [Card],
        [PunchDate],
        MAX([PunchTime]) [PunchTime],
        [In/Out]
FROM    @table
WHERE   [In/Out] = 2
GROUP BY [Card],
        [PunchDate],
        [In/Out]
ORDER BY [Card],
        [PunchDate]

Result:

Card    PunchDate   PunchTime   In/Out
123 2012-03-17 00:00:00.000 2012-03-17 13:00:00.000 1
123 2012-03-17 00:00:00.000 2012-03-17 20:00:00.000 2
456 2012-03-17 00:00:00.000 2012-03-17 14:00:00.000 1
456 2012-03-17 00:00:00.000 2012-03-17 17:00:00.000 2

Next he'll want this:

SELECT  a.[Card],
        a.[PunchDate],
        a.[PunchTime],
        b.[PunchTime],
        DATEDIFF(hour, a.[PunchTime], b.[PunchTime]) TotalTime
FROM    (
          SELECT    [Card],
                    [PunchDate],
                    MIN([PunchTime]) [PunchTime]
          FROM      @table
          WHERE     [In/Out] = 1
          GROUP BY  [Card],
                    [PunchDate]
        ) a
        INNER JOIN (
                     SELECT [Card],
                            [PunchDate],
                            MAX([PunchTime]) [PunchTime]
                     FROM   @table
                     WHERE  [In/Out] = 2
                     GROUP BY [Card],
                            [PunchDate]
                   ) b
            ON a.[Card] = b.[Card]
               AND a.[PunchDate] = b.[PunchDate]
ORDER BY a.[Card],
        a.[PunchDate]

Result

Card    PunchDate   PunchTime   PunchTime                   TotalTime
123 2012-03-17 00:00:00.000 2012-03-17 13:00:00.000 2012-03-17 20:00:00.000 7
456 2012-03-17 00:00:00.000 2012-03-17 14:00:00.000 2012-03-17 17:00:00.000 3

The OP wants the whole record, not just a part, where there are no other records with the same (card, punchdate, punchtime). — MatBailie, Mar 29 '12 at 15:02
But he definitely needs the `In/Out` field as well, and quite possibly the `id` field. — MatBailie, Mar 29 '12 at 15:20
You're right, adding the In/Out. I don't agree with the ID since he's just importing it to his time tracking system. — SQLMason, Mar 29 '12 at 15:24
This is also an effective solution and I feel like at one point I started to go this direction but got another idea that ultimately failed. However, I didn't need to get the total time because the time tracking software does that. I just have to get it from my application to that application in the proper format. — RobLife, Mar 29 '12 at 18:39
When deleting, you'd probably not want to delete by ID, if there would be duplicates, then it will screw you up. If you delete by card, date, and time, it will delete the duplicates - which is why I didn't show the ID. — SQLMason, Mar 29 '12 at 20:50

score 1 · Answer 3 · answered Mar 29 '12 at 14:35

1

Select
  *
FROM
  table
WHERE
  NOT EXISTS (
    SELECT
      *
    FROM
      table AS lookup
    WHERE
          ID       <> table.ID
      AND Card      = table.Card
      AND PunchTime = table.PunchTime
      AND PunchDate = table.PunchDate
  )

answered Mar 29 '12 at 14:35

MatBailie

83,401
18
103
137

This is also the solution and great. – RobLife Mar 29 '12 at 18:40
@Dems A side note, the standard with EXISTS is to use a SELECT 1 as there is no point in pulling any rows back. EXISTS just cares if a value was returned at all – Justin Pihony Mar 29 '12 at 18:42
@JustinPihony I agree, I usually do a `SELECT TOP 1 1` – SQLMason Mar 29 '12 at 19:08
@JustinPihony - Are you *absolutely* certain about that? Received wisdom is often worth challenging. Not only must you mean `fields` rather than `rows`, no rows *or* fields are actually returned at all. If you research this, it is not possible to measure a difference. And a MS Engineer is on record as saying the `*` *may* be expanded early in the compilation to execution plan, but is definitely discarded soon after. The belief that `*` is more expensive than `1` is most likely rooted in a general lack of understanding that SQL is actually compiled and not executed naively and verbatim. – MatBailie Mar 29 '12 at 19:41
@Dems Thanks, I would have assumed this, but the vast number of people that advocate this made me believe otherwise...I guess I get burned by assumptions (from others) again :p. I found this SO question that does advocate what you are saying http://stackoverflow.com/questions/1597442/subquery-using-exists-1-or-exists But, yes I do know it is compiled, and this makes perfect sense :) – Justin Pihony Mar 29 '12 at 19:51

Teja · Answer 4 · 2012-03-29T14:22:14.277

Select ID 
from table 
where ID 
in 
(SELECT A.ID FROM (Select distinct ID, Card, PunchTime, PunchDate) A);

In your query you have written.. you should have only the same number of columns outside the IN clause as well as inside IN clause. If you are using a single column you don't require any brackets outside the IN clause but if you have multiple columns then you need to include them with in the brackets.

Thumb Rule: SELECT Col1,Col2 ..Coln FROM TABLE WHERE Col1 IN (SELECT Col1 FROM TABLE ...) (For Single Column)

SELECT Col1,Col2 ..Coln FROM TABLE WHERE (Col1,Col2..Coln) IN (SELECT Col1,Col2..Coln FROM TABLE ...) (For Multiple Columns)

Why not just subtable it to begin and pull out the column you need? — Justin Pihony, Mar 29 '12 at 14:19

Icarus · Answer 5 · 2012-03-29T14:16:05.120

0

Select ID from table where ID in (
select ID from (
   Select distinct ID, 
   Card, 
   PunchTime, 
   PunchDate
   FROM 
   OTHER_TABLE
   ) x
)

You can only have one column in the IN clause; therefore, you need to alias (x- on my answer) the result and just select the ID column from there. Also note that inside the subselect, you need to specify the table where you are selecting the extra columns from (See CAPS on my answer).

edited Mar 29 '12 at 14:16

answered Mar 29 '12 at 14:11

Icarus

63,293
14
100
115

Why not subtable instead of the unnecessary nesting to use an IN? – Justin Pihony Mar 29 '12 at 14:22
@JustinPihony You are absolutely right, the question is too fuzzy in my opinion, I wasn't trying to make it efficient (your answer is far better in terms of efficiency) but rather pointing out why the OP's syntax wasn't working. – Icarus Mar 29 '12 at 14:25

Selecting Unique Values Not in the Substring

5 Answers5