0

I have an access table with 11 columns.

  • PeriodYear
  • PeriodCycle
  • PeriodZone
  • PHSRCode
  • LastName
  • FirstName
  • PRC
  • Specialty
  • HCPType
  • Class
  • Institution

I want to find duplicate entries for it using Lastname, Firstname, and Institution as the basis where to find the duplicates.

Example Table:

    PeriodYear  PeriodCycle  PeriodZone  PHSRCode  Lastname  Firstname  PRC  Specialty  HCPType  Class  Institution

    2013        1            1           SF1-01    Almeda    Sancho     111  GP         CON      A      PGH
    2013        1            2           SF1-01    Almeda    Sancho     111  GP         CON      A      LPDH
    2013        1            3           SF1-01    Almeda    Sancho     222  GP         CON      A      PGH

The result should be:

    PeriodYear  PeriodCycle  PeriodZone  PHSRCode  Lastname  Firstname  PRC  Specialty  HCPType  Class  Institution

    2013        1            1           SF1-01    Almeda    Sancho     111  GP         CON      A      PGH
    2013        1            3           SF1-01    Almeda    Sancho     222  GP         CON      A      PGH

The query will be based on the fields PeriodYear, PeriodCycle, and PHSRCode, which will come from variables.

Is this possible?

Sancho Almeda
  • 151
  • 2
  • 14
  • The solution should include a `GROUP BY` as well as a `HAVING COUNT` clause. – 0xCAFEBABE Jul 17 '13 at 09:03
  • http://stackoverflow.com/questions/4434118/select-statement-to-find-duplicates-on-certain-fields – rags Jul 17 '13 at 10:50
  • There is a discrepancy between your two table samples, LPDH has become PGH0 - is this intentional? And why would it return 111 instead of 222 for PGH? You need to consider these things logically before attempting to create your query. – Andy G Jul 17 '13 at 14:44
  • @AndyG : That's a my bad with the PGH0. That should be PGH as the 3rd item from the example table. The duplicates only primarily occurs in the Lastname, Firstname and Institution. So the difference in PRC should be negligible unless the end-user wants to include that in which case I may be able to account for it from the answers given. – Sancho Almeda Jul 18 '13 at 01:10

1 Answers1

2

First create a "pointer" query that determines all records that have dupes:

SELECT LastName, FirstName, Institution, Sum(1) as CNT
FROM MyTableName
GROUP BY LastName, FirstName, Institution
HAVING (Sum(1) > 1)

Call that Query1 or something.

Then LEFT JOIN Query1 to MyTableName on those fields, and use a WHERE clause to only keep the data you want based on your variables:

SELECT B.*
FROM Query1 A
LEFT JOIN MyTableName B
ON A.LastName = B.LastName 
AND A.FirstName = B.FirstName
AND A.Institution = B.Institution
WHERE B.PeriodYear = Forms!frmMyForm!txtPeriodYear
AND B.PeriodCycle = Forms!frmMyForm!txtPeriodCycle
AND B.PHSRCode = Forms!frmMyForm!txtPHSRCode

You will obviously have to do some tweaking, but this should give you the right idea.

Johnny Bones
  • 8,786
  • 7
  • 52
  • 117
  • You are correct that I need to do some tweaking. A question though, would it be possible to already do WHERE clause from the "pointer" query or does it impact performance? – Sancho Almeda Jul 18 '13 at 07:33
  • I think you can do that safely, if anything it would improve performance as the 2nd query would be looking at less records. How many records are we talking about; 100's of thousands, or 100's of millions? – Johnny Bones Jul 18 '13 at 13:11
  • Statistically, we would be looking at about 10,000 records a month so that would accumulate to about 100,000+ in a year. And since we are preserving records at for about two years, the data would probably amount to 300,000+ records before it is archived. – Sancho Almeda Jul 19 '13 at 02:58
  • 1
    Shouldn't be a problem. I've dealt with 1.5B+ records, although at that point it's best to have the back end in SQL Server. Half a million records in Access should process pretty quickly. – Johnny Bones Jul 19 '13 at 10:25
  • Thanks. I've tried your query and modified it a bit for my needs and it is indeed what I was looking for. Thank you for the answer. – Sancho Almeda Jul 21 '13 at 11:48