11

I have 60 sas datasets that contain data on consumers individual characteristics such as id, gender, age, amountSpent, .... Each dataset shows data only for one time period (data1 is Jan, data2 is Feb...). I cannot merge them because of the size and some other issues.

How can I write a multiple loop to go through each of the datasets, do some manipulations and save the estimated values to a temporary file.

SAS does not have a for loop. How can I use do?

LittleBobbyTables - Au Revoir
  • 32,008
  • 25
  • 109
  • 114
Buras
  • 3,069
  • 28
  • 79
  • 126

5 Answers5

7

But sas does have a do while macro loop. So basically you need 3 things: 1. In some way, a listing of your datasets. 2. A macro that loops over this listing. 3. The stuff you want to do.

E.g., let us presume you have a dataset WORK.DATASET_LIST that contains a variable library (libname) and a variable member (dataset name) for every dataset you want to loop across.

Then you could do:

%macro loopOverDatasets();
    /*imho good practice to declare macro variables of a macro locally*/
    %local datasetCount iter inLibref inMember;

    /*get number of datasets*/
    proc sql noprint;
        select count(*)
         into :datasetCount
        from WORK.DATASET_LIST;
    quit;

    /*initiate loop*/
    %let iter=1;
    %do %while (&iter.<= &datasetCount.);
        /*get libref and dataset name for dataset you will work on during this iteration*/
        data _NULL_;
            set WORK.DATASET_LIST (firstobs=&iter. obs=&iter.); *only read 1 record;
            *write the libname and dataset name to the macro variables;
            call symput("inLibref",strip(libname));
            call symput("inMember",strip(member));
            *NOTE: i am always mortified by the chance of trailing blanks torpedoing my code, hence the strip function;
        run;

        /*now you can apply your logic to the dataset*/
        data &inLibref..&inMember.; *assuming you want to apply the changes to the dataset itself;
            set &inLibref..&inMember.;
            /*** INSERT YOUR LOGIC HERE ***/
        run;

        /*** ANY OTHER PROCS/DATA STEPS ***/
        /*just remember to use &inLibref..&inMember. to refer to the current dataset*/

        /*increment the iterator of the loop*/
        %let iter=%eval(&iter.+1);
    %end;
%mend;

/*call the macro*/
%loopOverDatasets()

That is the idea. Maybe you want to gather the list of your datasets in a different way. e.g., a macro variable containing them all. In that case you'll have to use the %scan function in the loop to pick a dataset. Or maybe there is logic in the naming, e.g., dataset1, dataset2, dataset3..., in which case you could simply make use of the &iter. macro variable.

Vasilij Nevlev
  • 1,449
  • 9
  • 22
mvherweg
  • 1,272
  • 9
  • 11
  • I was assuming he meant the datasets really had a dataset_name_1 dataset_name_2 kind of naming scheme which makes the %do a lot simpler. But your answer is nicely more generic! – Neil Neyman Aug 01 '13 at 05:08
  • Hmm indeed, if the naming scheme is indeed that way, you could simplify it a bit. I've edited at the end to make mention of it. – mvherweg Aug 01 '13 at 05:09
7

Another option is to create a view that combines all the datasets, you won't get any data size problems with this approach (although I don't know if the other issues you refer to would be a problem here). You'll need a list of the relevant datasets, which may be obtained from DICTIONARY.TABLES in PROC SQL.

proc sql noprint;
select memname into :ds_list separated by ' '
from dictionary.tables
where libname='XXXXX';
quit;

data combined / view=combined;
set &ds_list;
run;

Then just run your summary against the view, so there's no need to loop through each dataset. I assume your datasets have a date variable, otherwise you'll need to add in some extra functionality (this applies to any solution). It would be interesting to see how this performs compared to the other solutions here.

Longfish
  • 7,582
  • 13
  • 19
  • 1
    If you're able to do this, I **highly** recommend it. I'll post a different solution for doing the separate processes, but combining them together is far superior. – Joe Aug 01 '13 at 13:39
  • 3
    Keith, would you (or would you mind if I) add a few pointers about the eventual processing using `by` groups (and perhaps a link to [don't be loopy](http://www2.sas.com/proceedings/forum2007/183-2007.pdf)? That would make this answer more useful, I'd think. – Joe Aug 01 '13 at 13:40
7

This is how I solve this problem in my day to day programming, when it is necessary to call a macro repeatedly based on data. This approach works equally well with many datasets or many variables from one dataset or many different macro calls from one dataset - whichever it is, simply create a dataset with the information that varies and call it this way.

This approach combines elements of Shorack's solution with user2337871's and Neil's. Why do it differently?

  • Macro should be called with parameters, not contain its parameter definitions inside of it. That makes it more flexible for future use (where for example the dataset_list dataset might be something different).
  • Having flexibility to call based on dataset of names rather than requiring macro to call macro
  • Removing code into a macro (instead of inside the call execute or other calling method) makes it easier to read.
  • call execute may have some drawbacks depending on the manipulation you are doing (related to macro variable timing)

Let's say you are doing a PROC MEANS and then appending that to a master dataset. While this is actually a very slow and annoying way to do that (as opposed to combining them together and using BY, or even using ODS OUTPUT with noncombined datasets), we'll assume your actual task is more complicated.

%macro do_my_stuff(dataset=);
proc means data=&dataset noprint;
var count;
output out=dsn_&dataset. mean=;
run;

proc append base=results data=dsn_&dataset. force;
run;
%mend do_my_Stuff;

proc sql;
select cats('%do_my_stuff(dataset=',name,')') into :stufflist separated by ' '
from dictionary.tables
where memname='WORK';
quit;

&stufflist;

You can add additional criteria to the where statement in the proc sql, or call that using CALL EXECUTE, or a number of different options. You can also use a self-made dataset with the dataset names (and even the variables as another column and macro parameter, if the variable(s) of interest vary by dataset).

Joe
  • 62,789
  • 6
  • 49
  • 67
  • I like this also. I agree that it makes more sense -- or would be at least better practice to pass the datasets as a list rather than in a specific self-contained macro. – Neil Neyman Aug 01 '13 at 16:08
3

My go-to answer is macro.

%MACRO process_datasets(mdataset);
     data &mdataset.;
     set &mdataset.;
     if age >= '50' then discountRate = .2;
     *whatever else you need here;
     run;

     data _null_;
       file 'tmp.csv' mod dsd dlm=',';  *I'm assuming you're saving everything to the same file;
       set &mdataset.;
       put (_all_) (+0);
     run;

%MEND process_datasets;

then you can call it from another macro loop...

%MACRO loop_through_all;
    %DO i = 1 to 60;
       %process_datasets(data&i.);
     %END;
%MEND loop_through_all;

%loop_through_all;
Neil Neyman
  • 2,116
  • 16
  • 21
  • +1 Admittedly, for the specific scenario of Buras, this is a simpler representation. (but the same overall idea i would say) – mvherweg Aug 01 '13 at 05:10
3

No need for macros, CALL EXECUTE will process every dataset in a libname or multiple libnames by simply building a query on the SASHELP.VTABLE and then executing a data step for each instance. I typically strip or compress the dataset names to ensure blanks don't cause problems, but will let you add that yourself. You can also make the relevant changes to subset and append the results to a single temp dataset.

DATA WANT;
    SET SASHELP.VTABLE (KEEP = LIBNAME MEMNAME  WHERE = (LIBNAME = "MAPSSAS")) END=EOF;
    STR = COMPBL("DATA " || MEMNAME || "; SET " || LIBNAME || "." || MEMNAME ||";" );
    STR1
    CALL EXECUTE (STR);

    IF EOF THEN DO;
        STR = 'RUN;';
        CALL EXECUTE (STR);
    END;
RUN;
user2337871
  • 450
  • 1
  • 5
  • 14
  • While the approach you propose would certainly work, i find that it results in code that becomes harder to read. – mvherweg Aug 01 '13 at 08:47
  • I disagree. I think the code is very straight forward, even if the second datastep is more complex there are other ways for this to be written to make it more appealing to the reader. The log also produces the resolved code which is also very easy to interpret. It is more succinct and also more efficient. – user2337871 Aug 01 '13 at 09:08
  • I'm sorry but this is certainly not more efficient. Ease of reading depends on whatever you're used to. Personally I work with macros a lot more than call execute but that's my personal preference. I can certainly understand how opinions may differ there. If there was a *very* large number of tables then this process may hit issues with the amount of generated code hitting the stack. – Robert Penridge Aug 01 '13 at 19:26