1

Hello everyone and welcome to my nightmare. I am preparing a database in order to run a number of SSIS packages to migrate the required data into a data warehouse.

I am in the process of updating customer names with accented characters to non-accented characters prior to running the SSIS package, as this causes my DimCustomer Population package to fail.

The query I used ran and ran quickly and efficiently (I was living the dream) however, I am now faced with three records which cause my UPDATE statement to run and run to the point I have to cancel the query after about an hour as this is nowhere near acceptable.

The characters contained in these names are: í á and . My suspicion is that these are only recognised in a particular format (which I cannot seem to find).


The queries and update I run are below for your perusal:

SELECT lastname COLLATE latin1_general_ci_ai      AS 'LastName',
       MIN(lastname COLLATE latin1_general_ci_as) AS 'CorrectLastName',
       MAX(lastname COLLATE latin1_general_ci_as) AS 'IncorrectLastName'
INTO   #lastname
FROM   learner WITH (nolock)
GROUP  BY lastname COLLATE latin1_general_ci_ai
HAVING MIN(lastname COLLATE latin1_general_ci_as) <> MAX(lastname COLLATE latin1_general_ci_as);

IF Object_id ('tempdb..#LastNameUpdate') IS NOT NULL
  DROP TABLE #lastnameupdate

SELECT L.learnerid,
       L.lastname,
       LN.correctlastname,
       LN.incorrectlastname
INTO   #lastnameupdate
FROM   learner L WITH (nolock)
       INNER JOIN #lastname AS LN
               ON L.lastname = LN.incorrectlastname

UPDATE L
SET    L.lastname = LNU.correctlastname
FROM   learner L
       INNER JOIN #lastnameupdate AS LNU
               ON L.learnerid = LNU.learnerid  

Please ask any questions if you need anymore info, I am a regular stack checker for pre-posted solutions and (even if a little sad) general reading. Any information, guidance or advise would be greatly appreciated. Thanks!

TT.
  • 15,774
  • 6
  • 47
  • 88
Will
  • 228
  • 1
  • 2
  • 15
  • 1
    In [one of my answers](http://stackoverflow.com/a/32048968/5089204) I offered a function which splits any string into single characters and re-concatenates this string after self-defined replacement rules... This is - for sure! - not fast with bigger data, but you might have a look on it... – Shnugo Oct 27 '16 at 10:27

1 Answers1

0

Just for speeding things up: you can add the following after creating the #lastname table.

Delete from #lastname where CorrectLastName = incorrectLastName

I think if you have only a few incorrect last names this will speed a lot up.

You can also try to avoid the creation of the table #lastnameupdate.
Because with a the use of a sub query instead you could do the same.
This should also speed the whole up a little bit.

Bert Levrau
  • 975
  • 8
  • 11