After applying the given suggestions, and also speaking with our DBA, the winner idea was to ditch the merge and use logical conditions over the loop.
Adding begin/commit seemed to reduce execution time by 1.5 to 3 seconds.
Adding a primary key to the target table did the best reduction, reducing execution time to about 13 seconds.
Converting the merge to conditional logic was the best option in this case, achieving the result in about 8 seconds.
When using conditionals, the primary key in target table is detrimental by a small amount (around 1 sec), but having it drastically reduces time afterwards, since this table is only a previous step for a big join. (That is, the result of this record-merging is latter used in a join with 11+ tables.) So I kept the P.K.
Since there seems to be no solution without a cursor loop, I used the conditionals to merge the values using variables and issuing only the inserts to the target table, thus eliminating the need to seek a record to update or to check its existence.
Here is a simplified example.
create table #source_t(account varchar(10), event varchar(10));
Insert into #source_t(account, event) values ('account1','event 1');
Insert into #source_t(account, event) values ('account1','event 2');
Insert into #source_t(account, event) values ('account1','event 3');
Insert into #source_t(account, event) values ('account2','came');
Insert into #source_t(account, event) values ('account2','saw');
Insert into #source_t(account, event) values ('account2','conquered');
create table #target(
account varchar(10), -- make primary key if the result is to be joined afterwards.
event_list varchar(2048)
);
declare ciclo cursor for
select account, event
from #source_t c
order by account --,...
for read only;
declare @account varchar(10), @event varchar(40), @last_account varchar(10), @event_list varchar(1000)
open ciclo
fetch ciclo into @account, @event
set @last_account = @account, @event_list = null
begin tran
while @@sqlstatus = 0 BEGIN
if @last_account <> @account begin -- if current record's account is different from previous, insert into table the concatenated event string
insert into #target(account, event_list) values (@last_account, @event_list)
set @event_list = null -- Empty the string for the next account
end
set @last_account = @account -- Copy current account to the variable that holds the previous one
set @event_list = case @event_list when null then @event else @event_list + ' | ' + @event end -- Concatenate events with separator
fetch ciclo into @account, @event
END
-- after the last fetch, @@sqlstatus changes to <> 0, the values remain in the variables but the loop ends, leaving the last record unprocessed.
insert into #target(account, event_list) values (@last_account, @event_list)
commit tran
close ciclo
deallocate cursor ciclo;
select * from #target;
drop table #target;
drop table #source_t;
Result:
account |event_list |
--------|---------------------------|
account1|event 1 | event 2 | event 3|
account2|saw | came | conquered |
This code worked fast enough in my real use case. However it could be further optimized by filtering the source table to hold only the values que would be necessary for the join afterward. For that matter I saved the final joined resultset (minus the join with #target) in another temp table, leaving some columns blank. Then #source_t was filled using only the accounts present in the resultset, processed it to #target and finally used #target to update the final result. With all that, execution time for production environment was dropped to around 8 seconds (including all steps).
UDF Solutions have solved this kind of problem for me before but in ASE 15 they have to be table-specific and need to write one function per column. Also, that is only possible in development environment, not authorized for production because of read only privileges.
In conclusion, A cursor loop combined with a merge statement is a simple solution for combining records using concatenation of certain values. A primary key or an index including the columns used for the match is required to boost performance.
Conditional logic results in even better performance but comes at the penalty of more complex code (the more you code, the more prone to error).