I have a column-based DB2-Blu table (DB2-Blu fixes indexing by itself) with 30.000 peer-groups and 50.000 values per peer group, total 1.5 billion rows.
I did a test to compare the run-time with two different processes: Proc Summary
and Proc Sql
like below:
proc summary data = table_blu nodisplay nway missing chartype;
/* Var1 and var2 are toghether peer-group */
class var1 var2 / groupinternal;
var values;
output out = stattable
sum=sum
;
run;
Calculating only the sum-up through the peer-groups and it took around 14 min
Then I did the similar calculation with Proc SQL
like below:
proc sql;
create table stattable as
select var1, var2, sum(values);
from table_blu
group by var1, var2;
quit;
This Proc SQL
took only 1,26 min
. It is definitely much less run_time comparing to the 14 min
with Proc Summary
process.
How can it be so big difference? Can it happen because the table is column-based? Maybe SAS is not optimal for column-based tables?