1

I have the following file name changing log table.

ChangeNameLog(Date, OldName, NewName)
Primary key: Date, OldName

The data of the table looks like

Date OldName NewName 
1/1  aaa     bbb
1/2  bbb     ccc
1/3  ccc     bbb
1/4  bbb     ddd
2/1  xx      yy
2/2  yy      zz

(The file name aaa was changed to bbb, and then to ccc, bbb, ddd later
The file name xx was changed to yy, and then to zz)

I want to get the oldest name for all the new names. The result will look like

Date NewName OldestName
1/2  bbb     aaa
1/3  ccc     aaa
1/4  ddd     aaa
2/1  yy      xx
2/2  zz      xx

Is anyway to write a Transact-SQL (Version 2008 is fine) without using cursor to loop the logging table?

The following SQL can be used to prepare the data.

declare @log table (
    Date Date, OldName varchar(20), NewName varchar(20) not null 
    primary key (Date, OldName)
);
-- The real table also have the following CK
-- create unique index IX_CK on @log (Date, NewName)

insert into @log values
 ('2012-01-01', 'aaa', 'bbb')
,('2012-01-02', 'bbb', 'ccc')
,('2012-01-03', 'ccc', 'bbb')
,('2012-01-04', 'bbb', 'ddd')
,('2012-01-05', 'ddd', 'eee')

,('2012-01-03', 'xx',  'yy')
,('2012-02-02', 'yy',  'zz')
,('2012-02-03', 'zz',  'xx')
;
ca9163d9
  • 27,283
  • 64
  • 210
  • 413
  • 1
    I'm not sure, but I'm fairly certain you'll need to be more specific and in-depth with your explanation of the problem you're facing to receive any helpful answers. –  Feb 24 '12 at 23:47
  • Have a look at: http://stackoverflow.com/questions/1757370/recursive-same-table-query-in-sql-server-2008 which deals with the same thing (recursive query) – Eddy Feb 24 '12 at 23:57
  • Do you really not have any other information in the log that identifies a file? What stops you from adding a row like `('2012-02-04', 'xx', 'eee')`? Then the next rename from `'eee' -> anything else`, which one does it belong to? – Aaron Bertrand Feb 25 '12 at 05:13
  • @AaronBertrand It will not be possible to have a row like `('2012-02-04', 'xx', 'eee')` if there is already a file named `eee`. If `eee` didn't exist anymore, then "Anything else" should have the original name of 'xx'. – ca9163d9 Feb 25 '12 at 05:30
  • How are you constraining this? Surely you have some way to identify a file other than its name... – Aaron Bertrand Feb 25 '12 at 05:32
  • In other words, if you have 10000 different files represented in this table, how do you prevent a duplicate from being inserted? Do you really check all 10000 files and make sure that their chain didn't end up with that name for the last row? I think some others had this right - you need to explain the business problem a little better and help us understand why you only want to recognize files by following some chain from their original name to their current name. – Aaron Bertrand Feb 25 '12 at 06:01
  • @AaronBertrand the table is generated from daily snapshop of the directory. There are some pre-checks/processes to make sure the data is good. – ca9163d9 Feb 25 '12 at 06:09
  • So you take a snapshot of the directory at some point in time, that doesn't come anywhere near explaining how you know that 'xx' used to be 'zz'? Do you see why it's hard for us to understand how "the data is good"? We have no idea how you're relating an old filename to a new one. If you took two snapshots five minutes apart, you must have some way to know that 'zz' was the file renamed to 'xx' and 'ddd' was the file renamed to 'eee'... – Aaron Bertrand Feb 25 '12 at 06:44
  • @AaronBertrand, only one snapshot a day. And there are pre-checks and two candidate keys. Just think it's a pure SQL excise on a well formed data. Forget about file names, create date. Just think it as a sequence of events of status changes. – ca9163d9 Feb 25 '12 at 06:47
  • But it's not really well-formed data if all you have is this chain of loosely related rename events. But I give up, I can't get blood from a stone. – Aaron Bertrand Feb 25 '12 at 06:49

3 Answers3

3

Setup:

declare @logtable table (Date date, OldName nvarchar(200), NewName varchar(200))

insert into @logtable values (convert (date, '1/1/12', 1), 'aaa', 'bbb')
insert into @logtable values (convert (date, '1/2/12', 1), 'bbb', 'ccc')
insert into @logtable values (convert (date, '1/3/12', 1), 'ccc', 'bbb')
insert into @logtable values (convert (date, '1/4/12', 1), 'bbb', 'ddd')
insert into @logtable values (convert (date, '2/1/12', 1), 'xx', 'yy')
insert into @logtable values (convert (date, '2/2/12', 1), 'yy', 'zz')

Now on to recursive CTE. First part (backtrack) recurses over log table matching past names and keeping information on chain top (EndName). Second part, starters, assigns row numbers to EndName by changedates, and finally only the oldest records are displayed. This part might be expressed in more ways, using not exist on changedate, or keeping original name in every log entry, but i would investigate another approach only if this code proves to be too slow.

; with backtrack as (
    select NewName EndName, NewName, OldName, Date
        from @logtable
    union all
    select EndName, [@logtable].NewName, [@logtable].OldName, [@logtable].Date
    from @logtable inner join backtrack
        on [@logtable].NewName = backtrack.OldName
        and [@logtable].Date < backtrack.Date
),
starters as (
    select EndName NewName, OldName, Date, ROW_NUMBER() over (partition by EndName order by Date) RowNumber
    from backtrack
)
select NewName, OldName
from starters
where RowNumber = 1

I hope that this will help you.

Nikola Markovinović
  • 18,963
  • 5
  • 46
  • 51
0

Or, for a different headache that plods through forwards:

declare @Helga as table ( Date datetime, OldName varchar(10), NewName varchar(10) )
insert into @Helga ( Date, OldName, NewName ) values
  ( '1/1/12', 'aaa', 'bbb' ), ( '1/2/12', 'bbb', 'ccc' ), ( '1/3/12', 'ccc', 'bbb' ),
  ( '1/4/12', 'bbb', 'ddd' ), ( '2/1/12', 'xx', 'yy' ), ( '2/2/12', 'yy', 'zz' )
select * from @Helga

; with Edmund as
( -- Get the oldest names.
  select L.Date, L.OldName, L.NewName, L.OldName as Methuselah, cast( 0 as bigint ) as Ethyl
    from @Helga as L left outer join
      @Helga as R on R.NewName = L.OldName
    where R.NewName is NULL
  union all
  -- Add newer names one generation at a time.
  select H.Date, H.OldName, H.NewName, H.Methuselah, H.Sandy
    from ( select iH.Date, iH.OldName, iH.NewName, Ed.Methuselah, Row_Number() over ( order by iH.Date ) as Sandy
      from Edmund as Ed cross join
        @Helga as iH where iH.OldName = Ed.NewName and iH.Date > Ed.Date ) as H
    where H.Sandy = 1
)
select Date, OldName, NewName, Methuselah
  from Edmund
  order by Methuselah, Date

Of course, this would be easier and more reliable if you assigned a consistent identification to each file that you persist across name changes. When you have NY > NJ > MA > CA crossing paths with MN > MA > CA > AL all bets are off. If the first sequence carried FileId 1 and the second was all 2 you could still sort out the details.

HABO
  • 15,314
  • 5
  • 39
  • 57
0

My own solution:

declare @log table (
    Date Date, OldName varchar(20), NewName varchar(20) 
    primary key (Date, OldName)
);

insert into @log values
 ('2012-01-01', 'aaa', 'bbb')
,('2012-01-02', 'bbb', 'ccc')
,('2012-01-03', 'ccc', 'bbb')
,('2012-01-04', 'bbb', 'ddd')
,('2012-01-05', 'ddd', 'eee')

,('2012-01-03', 'xx',  'yy')
,('2012-02-02', 'yy',  'zz')
,('2012-02-03', 'zz',  'xx')
;

;with m as (
    select Date, OldName, NewName, 1 as L
    from @log 
    union all
    select l.Date, m.OldName, l.NewName, L + 1
    from @log l join m on l.Date > m.Date and l.OldName = m.NewName
)
select * 
from m
where L = (select MAX(l) from m m1 where NewName = m.NewName and Date = m.Date)
order by 1

Output:
The following result shows that the two original names are aaa and xx.

Date       Orig Name L
2012-01-01  aaa bbb 1
2012-01-02  aaa ccc 2
2012-01-03  aaa bbb 3
2012-01-03  xx  yy  1
2012-01-04  aaa ddd 4
2012-01-05  aaa eee 5
2012-02-02  xx  zz  2
2012-02-03  xx  xx  3
ca9163d9
  • 27,283
  • 64
  • 210
  • 413
  • I don't understand how this is a "solution" - it doesn't give output anything like what you've listed as the expected result in the question. – Aaron Bertrand Feb 25 '12 at 06:52