2

I am trying to standardize street names in my address/contact tables. I'd like to do so for "Saint" vs "St". I have a list of several 100 Saints. The tables I want to update are large and I'd like to do the most efficient update that allows me in effect to say replace St with saint where the address fiekd contains any "Saint" followed by any of the following Names (Name1,...,NameN)

Even just the selects I've done are very slow, I even tried a regex e.g.

Address Regexp 'Saint (Saint1|...|SaintN)'

So I'm looking for some efficient way to replace "St" with "Saint" in several million records using a condition of "St " concatenated with several hundred possible Saint names.

user3649739
  • 1,829
  • 2
  • 18
  • 28
  • So far since I have no answer I'm working on a query that can reduce my record set to just those with "St %" OR "% St %" even though those may be 'Street' to at least reduce the overhead necessary to process this. However I'm still of the mind that there must be a more efficient way to process this. – user3649739 Apr 19 '16 at 14:39

1 Answers1

0

Okay, the only clear way i see to achieve what you want is to use IN but this still will be quite slow:

UPDATE table
SET address = REPLACE(address, 'st', 'saint')
WHERE address IN ('st name1','st name2','st name3')

You can try speeding this query up a bit by giving the address an index. This will makes queries using the address column to filter a bit faster.

I still suspect this will be quite slow if you have 2 milions of rows.

Jester
  • 1,408
  • 1
  • 9
  • 21
  • If this still won't do and you want to use LIKE with IN then i have bad news because that would require a lot of writing: http://stackoverflow.com/questions/1865353/combining-like-and-in-for-sql-server – Jester Apr 19 '16 at 14:43
  • Thanks, I am going to compare the IN Select vs a RegexP Select which is my only other option, will test on 100,000 records and report back – user3649739 Apr 19 '16 at 15:03
  • sure thing, make an index for the address column if you haven't yet. It might help somewhat. – Jester Apr 19 '16 at 15:05
  • Well Regex gained me very little even using ^ for end of field, likewise IN. What I ended up doing so far (haven't committed as answer yet but testing) is simply make 500 Updates using each Saint in it'w own Update/Where. It might be 6 of one 1/2 dozen of another not sure but seems to me running 500 different discrete update/wheres on 1 million records beats 1 query with 500 OR conditions. Like seems better even when doing all 4 permutations ('St X %', '% St X', 'St X', '% St X %' – user3649739 Apr 19 '16 at 18:01
  • ugh, that sounds terrible :p i assume you're comparing them using `EXPLAIN` ? i guess just use explain on the multiple ways you think of and see which one is the fastest. to be fair, this cleaning is not something that you should do yourself manually. better to find a better source (in many countries you can get information like this from some government site, or some site that scraped this information for government database and asks some money for it's use) – Jester Apr 19 '16 at 18:10
  • Not so bad actually, constructing the query text file took a few minutes, now I'll set it to run and assuming from tests this will take anywhere about one hour. – user3649739 Apr 19 '16 at 19:00
  • why not test it first on smaller batches? waiting an hour for some test runs is not something i would do. – Jester Apr 19 '16 at 19:14
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/109596/discussion-between-user3649739-and-jester). – user3649739 Apr 19 '16 at 20:50