6

I want to sort data in SAS data step. What exactly I mean is: the work of proc sort should be done in data step. Is there any solution?

Saran
  • 79
  • 1
  • 8
  • Would love to know why you would want to do such a thing. – Robert Penridge Dec 18 '17 at 21:45
  • Just got idea. There is no need to do like this in complicated way. But out of enthusiasm and anxiety I want to know: is it possible in data step. I searched in google but I didn't find any answer so I posted here. – Saran Dec 19 '17 at 07:24

3 Answers3

8

If you're looking for a data-step only solution, you can do the work of PROC SORT with a hash table. The caveat is that you need enough memory to do it.

If you want to do a simple sort, you would load the hash table with the ordered:'yes' option and output it to a new table. By default, ordered:yes will sort the data in ascending order. You can specify descending as well.

Simple Sort

data _null_;

    /* Sets up PDV without loading the table */
    if(0) then set sashelp.class;

    /* Load sashelp.class into memory ordered by Height. Do not remove duplicates. */
    dcl hash sortit(dataset:'sashelp.class', ordered:'yes', multidata:'yes');

        sortit.defineKey('Height');     * Order by height;
        sortit.defineData(all:'yes');   * Keep all variables in the output dataset;

    sortit.defineDone();

    /* Output to a dataset called class_sorted */
    sortit.Output(dataset:'class_sorted');
run;

De-duping

To remove duplicates, do the exact same operation, except remove the multidata option. In the below table, observations (8, 9) and (15, 16) are duplicates of each other. Observations 9 and 16 will be eliminated.

data _null_;

    /* Sets up PDV without loading the table */
    if(0) then set sashelp.class;

    /* Load sashelp.class into memory ordered by Height. Do not keep duplicates. */
    dcl hash sortit(dataset:'sashelp.class', ordered:'yes');

        sortit.defineKey('Height');     * Order by height;
        sortit.defineData(all:'yes');   * Keep all variables in the output dataset;
    sortit.defineDone();

    /* Output to a dataset called class_sorted */
    sortit.Output(dataset:'class_sorted');
run;
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
3

Stu beat me to it, but provided that your dataset contains a unique key, and you can fit the whole thing in memory, you can use a hash sort, e.g.:

data _null_;
  if 0 then set sashelp.class;
  declare hash h(dataset:"sashelp.class",ordered:"a");
  rc = h.definekey("age","sex","name");
  rc = h.definedata(ALL:'yes');
  rc = h.definedone();
  rc = h.output(dataset:"class_sorted");
  stop;
run;

If you are really determined to avoid using any built-in sort methods, a particularly silly approach is to load the whole dataset into a series of temporary arrays, sort the arrays using a hand-coded algorithm, and export again:

https://codereview.stackexchange.com/questions/79952/quicksort-in-sas-for-sorting-datasets

user667489
  • 9,501
  • 2
  • 24
  • 35
1

There is solution in use proc ds2.

/*Just prepare dataset, because DS2 responds with an error on libraries like sashelp. */
data sql_prep;
set sashelp.class;
run;

/*Delete test dataset before ds2 func, to avoid errors*/
proc datasets nodetails nolist;
delete test;
run;

proc ds2;

data test;
   method run();
      set {select * from sql_prep order by Weight};
   end;
enddata;
run;
quit;

More info about ds2 errors with sashelp libs.

Appendix into ds2 docs, about sql in ds2.

Sanek Zhitnik
  • 716
  • 1
  • 10
  • 25