0

I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients from PROC MIXED. The main outline is as follows:

I have a panel data set, say firm and year are the indices. For each iteration of the bootstrap, I wish to sample with replacement n subjects. From this sample, I need to construct a new data set that is a "stack" (concatenated row on top of row) of all the observations for each sampled subject. With this new data set, I can run the regression and pull out the coefficients of interest. Repeat for a bunch of iterations, say 2000.

Each firm can potentially be selected multiple times, so I need to include its data multiple times in each iteration's data set. Using a loop and subset approach, seems computationally burdensome. My real data set quite large (a 2Gb .sas7bdat file).

Example pseudo/explanatory code (please pardon all noob errors!):

DATA subjectlist;
  SET mydata;
  BY firm;
  IF first.firm;
RUN;

%macro blockboot(input=, subjects=, iterations=);

%let numberfirms = LENGTH(&subjects);

  %do i = 1 %to &iterations ;
    DATA mytempdat;
      DO i=1 TO &numberfirms;
        rec = ceil(&numberfirms * ranuni(0));

        *** This is where I want to include all observations for the randomly selected subjects;
        *** However, this code doesn't include the same subject multiple times, which...;
        *** ...is what I want;
        SET &INPUT subjects IN &subjects;

      OUTPUT;
      END;
     STOP;

  PROC MIXED DATA=mytempdat; 
    CLASS firm year; 
    MODEL yval= cov1 cov2; 
    RANDOM intercept /sub=subject type=un; 
    OUTPUT out=outx cov1=cov1 ***want to output the coefficient estimate on cov1 here;
  RUN; 

    %IF &i = 1 %THEN %DO;
      DATA outall;
        SET outx;
      %END;
    %ELSE %DO;
      PROC APPEND base=outall data=outx;
      %END;
    %END;  /* i=1 to &REPS loop */

  PROC UNIVARIATE data=outall;
    VAR cov1;
    OUTPUT out=final pctlpts=2.5, 97.5 pctlpre=ci;

%mend;

%blockboot(input=mydata,subjects=subjectlist, reps=2000)

This question is identical to a question I asked previously, found here:

block bootstrap from subject list

Any help is appreciated!

Community
  • 1
  • 1
baha-kev
  • 3,029
  • 9
  • 33
  • 31

1 Answers1

1

See the following paper for details on the best way to do this in SAS:

http://www2.sas.com/proceedings/forum2007/183-2007.pdf

The general summary is to use PROC SURVEYSELECT with a method that allows sampling with replacement to create your bootstrap sample, then use BY processing with PROC MIXED to run the PROC only once rather than running it 2000 times.

Joe
  • 62,789
  • 6
  • 49
  • 67