0

I'm trying to determine how to cleanup and populate a set of address lines in a SQL Server query. (Needs to be compatible with SQL Server 2008 and up)

I have 5 address lines Addr1 - Addr5 that I want to output into 4 consecutive populated columns in a query. If N addresses are non-null, then the final addresses FAddr1 thru FAddrN should be populated. (Any remaining columns should be null. If all 5 address lines are populated then the last one is dropped.)

Note that I might add a 6th Addr column later, but this doesn't have to be generically solved for any N columns. (6 Addr columns should be the max.) The data can't be standardized beforehand, so I have to do this on the fly in the fastest way possible.

The complexity is how to shift the address line columns "to the left" to overwrite the blank columns.

Here's an example:

Addr1           Addr2           Addr3           Addr4           Addr5           
--------------------------------------------------------------------------------
                101 Main St     Apt 1           Here, AZ 11111  USA
                102 Main St                     Here, AZ 22222
103 Main St     Apt 3           Here, AZ 33333  USA
104 Main St                                     Here, AZ 44444  USA

For these examples I would like to output:

FAddr1          FAddr2          FAddr3          FAddr4          
----------------------------------------------------------------
101 Main St     Apt 1           Here, AZ 11111
102 Main St     Here, AZ 22222
103 Main St     Apt 3           Here, AZ 33333  USA
104 Main St     Here, AZ 44444  USA

I've written this in C#, but the complexity is doing this in SQL in a clean way. It's much easier (see related questions below) to do this by concatenating into one string. But I need these in 4 separate columns.

Related questions: Most Similar Question - Different Use Case / Similar But With String Instead of Columns

Initial Version:

I've written a version below in SQL using a test case that seems to be working, but I've hoping there is a cleaner way. This basically keeps shifting the non-null elements. But it takes a sub-select per column to shift everything to the left.

Select
    Addr1,
    case when Addr2 is null then Addr3 else Addr2 end as Addr2,
    case when Addr2 is null then null else Addr3 end as Addr3,
    case when Addr4 is null then Addr5 else Addr4 end as Addr4
    -- Truncate to 4 - case when Addr4 is null then null else Addr5 end as Addr5
From
(
Select
    case when Addr1 is null then Addr2 else Addr1 end as Addr1,
    case when Addr1 is null then null else Addr2 end as Addr2,
    case when Addr3 is null then Addr4 else Addr3 end as Addr3,
    case when Addr3 is null then null else Addr4 end as Addr4,
    Addr5
From
(
Select
    Addr1,
    case when Addr2 is null then Addr3 else Addr2 end as Addr2,
    case when Addr2 is null then null else Addr3 end as Addr3,
    case when Addr4 is null then Addr5 else Addr4 end as Addr4,
    case when Addr4 is null then null else Addr5 end as Addr5
From
(
Select
    case when Addr1 is null then Addr2 else Addr1 end as Addr1,
    case when Addr1 is null then null else Addr2 end as Addr2,
    case when Addr3 is null then Addr4 else Addr3 end as Addr3,
    case when Addr3 is null then null else Addr4 end as Addr4,
    Addr5
From
(
Select
    Addr1,
    case when Addr2 is null then Addr3 else Addr2 end as Addr2,
    case when Addr2 is null then null else Addr3 end as Addr3,
    case when Addr4 is null then Addr5 else Addr4 end as Addr4,
    case when Addr4 is null then null else Addr5 end as Addr5
From
(
    SELECT NULL as Addr1, NULL as Addr2, NULL as Addr3, NULL as Addr4, NULL  as Addr5 UNION
    SELECT NULL, NULL, NULL, NULL,  '5' UNION
    SELECT NULL, NULL, NULL,  '4', NULL UNION
    SELECT NULL, NULL, NULL,  '4',  '5' UNION
    SELECT NULL, NULL,  '3', NULL, NULL UNION
    SELECT NULL, NULL,  '3', NULL,  '5' UNION
    SELECT NULL, NULL,  '3',  '4', NULL UNION
    SELECT NULL, NULL,  '3',  '4',  '5' UNION
    SELECT NULL,  '2', NULL, NULL, NULL UNION
    SELECT NULL,  '2', NULL, NULL,  '5' UNION
    SELECT NULL,  '2', NULL,  '4', NULL UNION
    SELECT NULL,  '2', NULL,  '4',  '5' UNION
    SELECT NULL,  '2',  '3', NULL, NULL UNION
    SELECT NULL,  '2',  '3', NULL,  '5' UNION
    SELECT NULL,  '2',  '3',  '4', NULL UNION
    SELECT NULL,  '2',  '3',  '4',  '5' UNION
    SELECT  '1', NULL, NULL, NULL, NULL UNION
    SELECT  '1', NULL, NULL, NULL,  '5' UNION
    SELECT  '1', NULL, NULL,  '4', NULL UNION
    SELECT  '1', NULL, NULL,  '4',  '5' UNION
    SELECT  '1', NULL,  '3', NULL, NULL UNION
    SELECT  '1', NULL,  '3', NULL,  '5' UNION
    SELECT  '1', NULL,  '3',  '4', NULL UNION
    SELECT  '1', NULL,  '3',  '4',  '5' UNION
    SELECT  '1',  '2', NULL, NULL, NULL UNION
    SELECT  '1',  '2', NULL, NULL,  '5' UNION
    SELECT  '1',  '2', NULL,  '4', NULL UNION
    SELECT  '1',  '2', NULL,  '4',  '5' UNION
    SELECT  '1',  '2',  '3', NULL, NULL UNION
    SELECT  '1',  '2',  '3', NULL,  '5' UNION
    SELECT  '1',  '2',  '3',  '4', NULL UNION
    SELECT  '1',  '2',  '3',  '4',  '5'
) as Base
) as Pass1 ) as Pass2 ) as Pass3 ) as Pass4

Dale K
  • 25,246
  • 15
  • 42
  • 71
Russell
  • 1
  • 1

2 Answers2

0

this should work for you:

with Temp1 (AddrStr) as (
select
TRIM(CONCAT( Addr1 + ', ', Addr2 + ', ', Addr3 + ', ', Addr4 + ', ', Addr5)) AS [AddrStr]
From
(
... code to select the 5 address columns

)
) 
as base
)
select
AddrStr
,REVERSE(PARSENAME(REPLACE(REVERSE(AddrStr), ',', '.'), 1)) AS [Addr1]
,REVERSE(PARSENAME(REPLACE(REVERSE(AddrStr), ',', '.'), 2)) AS [Addr2]
,REVERSE(PARSENAME(REPLACE(REVERSE(AddrStr), ',', '.'), 3)) AS [Addr3]
,REVERSE(PARSENAME(REPLACE(REVERSE(AddrStr), ',', '.'), 4)) AS [Addr4]
,REVERSE(PARSENAME(REPLACE(REVERSE(AddrStr), ',', '.'), 5)) AS [Addr5]
From Temp1

You could, of course, simplify this by replacing "AddrStr" in the REVERSE... statements with the TRIM(CONCAT ... statement - but I've split it into 2 steps for the sake of clarity e.g.

,REVERSE(PARSENAME(REPLACE(REVERSE(TRIM(CONCAT( Addr1 + ', ', Addr2 + ', ', Addr3 + ', ', Addr4 + ', ', Addr5))), ',', '.'), 1)) AS [Addr1]
NickW
  • 8,430
  • 2
  • 6
  • 19
  • Quick comments from tests: 1) This method is FAST, it's just as fast as my original attempt, but much shorter. 2) It looks like PARSENAME() will only grab the first 3 components of the identifier. I think I can code around this. [Maybe this](https://forums.sqlteam.com/t/is-parsename-safe-for-general-parsing-and-splitting/5990/4) 3) I have both dots and commas in my addresses. This one is trickier since my addresses might have 6+ commas / dots in them. I can use CHAR(1) instead of comma. 4) I'd rather not to concat strings due to the edge cases, but maybe this is robust enough. – Russell Nov 25 '20 at 03:32
0

You can use apply here:

select b.*, v.*
from base b cross apply
     (select max(case when v.seqnum = 1 then v.addr end) as faddr1,
             max(case when v.seqnum = 2 then v.addr end) as faddr2,
             max(case when v.seqnum = 3 then v.addr end) as faddr3,
             max(case when v.seqnum = 4 then v.addr end) as faddr4,
             max(case when v.seqnum = 5 then v.addr end) as faddr5
      from (select v.*, row_number() over (order by ord) as seqnum
            from (values (b.addr1, 1), (b.addr2, 2), (b.addr3, 3), (b.addr4, 4), (b.addr5, 5)
                 ) v(addr, ord)
            where v.addr is not null
           ) v
      ) v;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Quick comments from tests: 1) This method is CLEAN, much shorter, and more robust than my original. 2) On my larger dataset this takes 5-10x longer to run than my original / string parsing methods both of which are mostly Compute Scalars. Any ideas how to make this faster? 3) It returns this message: "Warning: Null value is eliminated by an aggregate or other SET operation." I didn't see a way offhand to fix. 4) Also any good reference on over & cross apply? Googling hasn't led me to a clear resource that explains this clearly to me. – Russell Nov 25 '20 at 03:49
  • @Russell . . . Don't worry about the warning. I am surprised this is 5-10x slower than your version. Usually lateral joins are relatively efficient (slower, but not that much slower). – Gordon Linoff Nov 25 '20 at 14:51
  • @GordonLonoff Good deal on the warning, thanks. It might be something about my dataset that makes it slower, 90% of the time only Addr1 and 4 are populated and the rest are null. But I have to handle all cases. I ran it both ways a few times and cycled SQL Server to clear caching, so it wasn't a fully scientific test, but a rough idea. It was on a bit less than 1 million records. – Russell Nov 25 '20 at 15:41
  • @Russell . . . .Given the way that your query is optimized, I am skeptical that you can construct anything faster. – Gordon Linoff Nov 25 '20 at 17:02