1

I need advice on how to break up this query so it's not so slow. I've been reading up on EXISTS and UNION but I'm not sure how they can be used in this situation. LineA/B/C are all nvarchar(max) and I can't change the join to be on something other than text. Has to be line this, unfortunately.

UPDATE n SET AddressID = a.AddressID
FROM #NewAddress n
JOIN dbo.Address a
    ON (a.[LineA] = n.[LineA] OR (a.[LineA] is null AND n.[LineA] is null))
    AND (a.[LineB] = n.[LineB] OR (a.[LineB] is null AND n.[LineB] is null))
    AND (a.[LineC] = n.[LineC] OR (a.[LineC] is null AND n.[LineC] is null))
WHERE n.Processed = 0

Any ideas on how to UNION or EXISTs this query? This seems to be a pretty common join condition I'm encountering

I'm not sure how to post a query plan, but it's really small like this

SELECT 0% <---- Hash Match (Inner Join) 75% <------- Clustered Index Scan 0%
                                            <------- Clustered Index Scan 25%

None of these rows are indexed as they are nvarchar(max). I can't index dbo.Address either

-- RESOLVED

I added an index to the Processed column in #NewAddress. Apparently, that was the slow part

user1002479
  • 235
  • 2
  • 4
  • 12
  • Can `LineA` really be null? If `LineA` means `AddressLine1` I'm highly skeptical of your data integrity... – canon May 07 '13 at 18:49
  • Yes, it can be null. Very sketchy I know. This is the hand I've been dealt – user1002479 May 07 '13 at 18:57
  • Are there any indexes already on dbo.Address? Can you give us some idea of the numbers of records on each table? Do you actually want all the fields from both tables in your select? –  May 07 '13 at 19:03
  • Are there other fields except lines `A` through `C`? Do you need them selected? – Quassnoi May 07 '13 at 19:05
  • There's a primary key on Address (AddressID), but I'm only provided the LineA/LineB/LineC values and I have to match it up. The rows are nvarchar(max). I'm actually updating #NewAddress with the Address.AddressID value – user1002479 May 07 '13 at 19:05
  • Can you give us some idea of the numbers of records on each table? Do you have the privileges necessary to create either indexes or indexed views? –  May 07 '13 at 19:10
  • I don't have the privileges to do so, no. There's typically around 1k rows in #NewAddress and around 200k in Address – user1002479 May 07 '13 at 19:12
  • 1
    Does coalesce() yield any improvement? `on coalesce(a.LineA,'') = coalesce(n.LineA,'')` And what happens if the addresses match but on different lines, e.g. LineA matches LineB? – Tim May 07 '13 at 19:17
  • @Tim: this treats string `''` and actual `NULL` as same. See here why shouldn't you rely on this: http://stackoverflow.com/questions/4456438/how-can-i-pass-the-string-null-through-wsdl-soap-from-as3-to-coldfusion-web – Quassnoi May 07 '13 at 19:19
  • @Quassnoi: not sure what you mean. If the column value `is null` I am substituting the string `` in order to be able to use the "=" operator. That is, I am also adding angle brackets as part of the string. – Tim May 07 '13 at 19:23
  • @Tim: What if the string actually has `` in it? –  May 07 '13 at 19:24
  • @Tim - I changed to `COALESCE(a.LineB, '') = COALESCE(b.LineB, '')` for readability. This is the same as `(a.[LineB] = n.[LineB] OR (a.[LineB] is null AND n.[LineB] is null))`, right? – user1002479 May 07 '13 at 19:26
  • Can you create an indexed view? This does not affect the original tables and does not require write or modify permission on them. – Quassnoi May 07 '13 at 19:28
  • Then it would match, and I think in this particular instance the OP might actually want a truly absent value in LineA (i.e. NULL) to match the string in LineB. This programming paradigm is known as "fighting shite with shite", though some prefer "fire". :-) – Tim May 07 '13 at 19:28
  • user1002479: Do you have the privileges to create an indexed temporary table? –  May 07 '13 at 19:34
  • @user1002479: COALESCE returns the first non-null value. In your case you're returning zero-length string.Would work the same. I was choosing (open angle bracket n-u-l-l close angle bracket) to make the substitution of a string for a null as explicit as possible (to someone who happened upon your code). But see Quassnoi's and Mark Bannister's caveat above. In this particular case I don't think the "false positives" would matter very much. But you have to decide if you want a zero-length string present in the original data to match a NULL. – Tim May 07 '13 at 19:34
  • Alright. Since there are no zero-length strings present, I'm safe to use COALESCE(a.LineA, '')? Thanks. Whoever had that as the answer but deleted it, please repost it. Even though it didn't solve the issue, it helped with the readability – user1002479 May 07 '13 at 19:42
  • @user1002479 it doesn't much improve readability and it'll hurt performance if you're currently relying on indexes. – canon May 07 '13 at 20:25

1 Answers1

0

You can't really improve this query much without indexing, but using this syntax:

SELECT  *
FROM    a
JOIN    b
ON      EXISTS
        (
        SELECT  a.lineA, a.lineB, a.lineC
        INTERSECT
        SELECT  b.lineA, b.lineB, b.lineC
        )

you give the optimizer ability to use MERGE JOIN which theoretically can be more efficient.

See this SQLFiddle (with actual MERGE JOIN in the plan)

Quassnoi
  • 413,100
  • 91
  • 616
  • 614