2

I have a SAS dataset with 3 columns. A FirmIndex, a ProducIndex and a third column called PrChange. Within each FirmIndex & ProductIndex group I want to count how many PrChange are different from . and from 0, and add that to a column called Number. Then I want to divide that column Number by the number of observations within each group which are not ..

Below an example of the dataset and desired output.

data prod;
input firmindex productindex PrChange Number Fract;
cards;

1   1   .   1   0.250
1   1   0.00    1   0.250
1   1   0.00    1   0.250
1   1   -0.40   1   0.250
1   1   0.00    1   0.250
1   2   .   2   1.000
1   2   1.00    2   1.000
1   2   0.30    2   1.000
1   3   .   4   0.800
1   3   0.70    4   0.800
1   3   1.00    4   0.800
1   3   0.70    4   0.800
1   3   0.00    4   0.800
1   3   -0.30   4   0.800
1   4   .   5   1.000
1   4   0.20    5   1.000
1   4   -1.00   5   1.000
1   4   -0.90   5   1.000
1   4   -0.50   5   1.000
1   4   1.00    5   1.000
2   1   .   2   1.000
2   1   0.30    2   1.000
2   1   -0.50   2   1.000
2   2   .   5   0.714
2   2   0.30    5   0.714
2   2   0.10    5   0.714
2   2   0.00    5   0.714
2   2   0.00    5   0.714
2   2   0.80    5   0.714
2   2   -0.20   5   0.714
2   2   0.40    5   0.714
2   3   .   1   1.000
2   3   0.60    1   1.000
2   4   .   5   0.714
2   4   -1.00   5   0.714
2   4   0.80    5   0.714
2   4   -0.20   5   0.714
2   4   0.00    5   0.714
2   4   0.00    5   0.714
2   4   -0.70   5   0.714
2   4   0.90    5   0.714
2   5   .   3   1.000
2   5   0.90    3   1.000
2   5   -0.70   3   1.000
2   5   -0.50   3   1.000
;
run;

Here is what I tried to generate the column number, but it is not working:

data work.prod;
    set work.prod;
    by firmindex productindex;
    if first.productindex  or first.firmindex then sum = 0;
        else if PrChange ne 0 and PrChange ne .;
        sum = sum + 1;
run;
Joe
  • 62,789
  • 6
  • 49
  • 67
phdstudent
  • 1,060
  • 20
  • 41
  • Is your desired result contained in posted data? – Parfait May 05 '17 at 19:01
  • Yes, indeed it is. The last two columns are the desired output. – phdstudent May 05 '17 at 19:03
  • Sorry. It was an answer to someone else comment. That someone else in the meantime deleted the comment. I am testing your solution now. – phdstudent May 05 '17 at 19:13
  • BTW your language there ("vector") makes me think you're used to matrix languages (e.g. R) - if you are, do you use SAS/IML? That might be more straightforward for you solution-wise. – Joe May 05 '17 at 19:14
  • Thanks. I am indeed more used to matrix languages (matlab, Fortran, never used R). I will take a look into SAS/IML. – phdstudent May 05 '17 at 19:15
  • Added the [tag:sas-iml] to see if one of the IML folks will come by and answer in that vein better than my answer, since I feel like this is well suited for that product. – Joe May 05 '17 at 20:31

4 Answers4

1

Your problem here is that you need the number to divide by prior to running the rows of data. This is where SAS is different from Excel; SAS is row-based, meaning it takes your code and runs it against each row of data (more or less) one at a time, rather than dynamically looking at every cell from every other cell (like Excel). Much faster and more efficient but less flexible for stuff like this.

Your particular question begs for a DoW loop. This takes over the normal data step loop and performs its own loop - twice. Once to calculate the number/fract values, then once to copy those to the BY group. Note I only check for last.productIndex; last/first transitions are always set on a second by variable when they're true for the first by variable.

Here we do the first loop once for the first set of values( the first 5 records) then we re-loop through the same 5 records. Then for the next 3. Etc. Each time the two loops take the same number of rows so they always stay in sync.

data want;
  do _n_ = 1 by 1 until (last.productIndex);
    set have;
    by firmindex productindex;
    number_denom = sum(number_Denom,not missing(PrChange));
    number       = sum(number, not (PrChange in (.,0)));
  end;
  fract = number/number_denom;
  do _n_ = 1 by 1 until (last.productIndex);
    set have;
    by firmindex productindex;
    output;
  end;
run;
Joe
  • 62,789
  • 6
  • 49
  • 67
1

I'm going to give the IML answer that I'm able to give. Rick or someone else more IML-savvy probably can do better than this. In R or other matrix language I think this would be much easier, but I don't have the IML chops to do this without looping; maybe it's possible.

proc iml;
  use have;
  read all var _all_ into h;
  u =  h[uniqueby(h,1:2), 1:2];   *generate the "unique" categories for the first two columns;

  v = j(nrow(h),5);               *generate a matrix to save this into;
  v[,1:3] = h;                    *start it out with the first three columns of the dataset;
  do i  = 1 to nrow(u);           *iterate over the unique category matrix;
    number = ncol(loc(h[loc((h[,1:2] = u[i,1:2])[,#]),3]));
                                  *the inner LOC produces a two column 1/0 matrix with match 1 / nomatch 0 for each col
                                   then reduce to 1 column via subscript reduction product, to get correct 1/0 match vector
                                   the outer LOC takes the rows of h from that (so rows of h matching u), then returns nonzero/nonmissing
                                   which then ncol summarizes into a count;
    fract_denom = ncol(loc(h[loc((h[,1:2] = u[i,1:2])[,#]),3] ^= .));
                                  *similar, but here we have to verify they are not missing explicitly, considering 0 valid;

   v[loc((v[,1:2] = u[i,1:2])[,#]),4] = number;             *assign to col4 of V;
   v[loc((v[,1:2] = u[i,1:2])[,#]),5] = number/fract_denom; *assign to col5 of V;

  end;
  print v;

quit;

This uses the unique-loc method, more or less, with some modifications; probably is an easier way to get the matches.

Joe
  • 62,789
  • 6
  • 49
  • 67
0

A SQL in SAS solution - Parfait's is probably the better one overall, but SAS's willingness to remerge makes the SASsy solution a bit simpler.

proc sql;
  create table want as
    select firmindex, productindex, prchange, 
           sum (not (prchange in (0,.))) as number,
           calculated number / (sum ( not missing(prchange))) as fract
       from have
       group by firmindex, productindex;
quit;

SAS will do the grouping/counting/etc. and then merge back to the original dataset with no problem, skipping the need for correlated subqueries. NOT standard SQL, but quite common in SAS nonetheless.

Joe
  • 62,789
  • 6
  • 49
  • 67
-1

Consider proc sql using conditional CASE WHEN correlated subqueries:

proc sql;
    create table ProdChangeCount as
    SELECT p.firmindex, p.productindex,
           (SELECT SUM(CASE WHEN sub.PrChange ^= . AND sub.PrChange ^= 0 THEN 1 ELSE 0 END)
            FROM Prod sub
            WHERE sub.firmindex = p.firmindex 
            AND sub.productindex = p.productindex) AS Number,

           CALCULATED Number / 
           (SELECT Count(*)
            FROM Prod sub
            WHERE sub.PrChange ^= . 
            AND sub.firmindex = p.firmindex 
            AND sub.productindex = p.productindex) AS Frac
    FROM Prod p;

quit;
Parfait
  • 104,375
  • 17
  • 94
  • 125