9

I am trying to create a new data extract from a (badly designed) sql database. The customer requires that I add a distinctidentifier which I am attempting to do using the NEWID() function. Unfortunately this leads to multiple duplicate records being returned.

After a bit of research I have found that the NEWID() function does indeed 'undo' the use of the distinct keyword, but I cannot work out why or how to overcome this.

An example of the query I am trying to write is as follows:

select distinct

    NEWID() as UUID
    ,Histo_Results_File.ISRN
    ,Histo_Results_File.Internal_Patient_No
    ,Histo_Results_File.Date_of_Birth
    ,Histo_Result_freetext.histo_report
    ,Histo_Report.Date_Report_Updated  as [Investigation_Result_Date]

from apex.Histo_Results_File
            inner join apex.Histo_Report on (Histo_Report.Histo_Results_File = Histo_Results_File.ID)

If I miss out the NEWID() line in the select block, I get 569 records returned, which is correct, but if I include that line then I get in excess of 30,000 which are all duplicates of the original 569 but with different IDs. Can anyone suggest a way around this problem?

Thanks in advance

Smoggie Tom
  • 306
  • 4
  • 12

3 Answers3

8

Use a sub query would be the easiest way to do it.

SELECT NEWID() as UUID
, * -- this is everything from below
FROM (
select distinct
     Histo_Results_File.ISRN
    ,Histo_Results_File.Internal_Patient_No
    ,Histo_Results_File.Date_of_Birth
    ,Histo_Result_freetext.histo_report
    ,Histo_Report.Date_Report_Updated  as [Investigation_Result_Date]

from apex.Histo_Results_File
            inner join apex.Histo_Report on (Histo_Report.Histo_Results_File = Histo_Results_File.ID)) as mySub
Igor
  • 60,821
  • 10
  • 100
  • 175
1
select NEWID() as UUID
    ,ISRN
    ,Internal_Patient_No
    ,Date_of_Birth
    ,histo_report
    ,Investigation_Result_Date
from (
select distinct
    ,Histo_Results_File.ISRN
    ,Histo_Results_File.Internal_Patient_No
    ,Histo_Results_File.Date_of_Birth
    ,Histo_Result_freetext.histo_report
    ,Histo_Report.Date_Report_Updated  as [Investigation_Result_Date]

from apex.Histo_Results_File
            inner join apex.Histo_Report on (Histo_Report.Histo_Results_File = Histo_Results_File.ID)) t
openshac
  • 4,966
  • 5
  • 46
  • 77
0

You can use a sub-query to get around the issue, something like.....

SELECT NEWID() as UUID
      ,*
FROM (
select distinct
     Histo_Results_File.ISRN
    ,Histo_Results_File.Internal_Patient_No
    ,Histo_Results_File.Date_of_Birth
    ,Histo_Result_freetext.histo_report
    ,Histo_Report.Date_Report_Updated  as [Investigation_Result_Date]

from apex.Histo_Results_File
            inner join apex.Histo_Report 
 on (Histo_Report.Histo_Results_File = Histo_Results_File.ID)
 ) t
M.Ali
  • 67,945
  • 13
  • 101
  • 127