4

I have a dataset like this (but with several hundred vars):

id  q1  g7  q3  b2  zz  gl  az  tre
1   1   2   1   1   1   2   1   1
2   2   3   3   2   2   2   1   1
3   1   2   3   3   2   1   3   3
4   3   1   2   2   3   2   1   1
5   2   1   2   2   1   2   3   3
6   3   1   1   2   2   1   3   3

I'd like to keep id, b2, and tre, but set everything else to missing. In a dataset this small, I can easily use call missing (q1, g7, q3, zz, gl, az) - but in a set with many more variables, I would effectively like to say call missing (of _ALL_ *except ID, b2, tre*).

Obviously, SAS can't read my mind. I've considered workarounds that involve another data step or proc sql where I copy the original variables to a new ds and merge them back on post, but I'm trying to find a more elegant solution.

J.Q
  • 971
  • 1
  • 14
  • 29

3 Answers3

8

This technique uses an un-executed set statement (compile time function only) to define all variables in the original data set. Keeps the order and all variable attributes type, labels, format etc. Basically setting all the variables to missing. The next SET statement which will execute brings in only the variables the are NOT to be set to missing. It doesn't explicitly set variables to missing but achieves the same result.

   data nomiss;
       input id  q1  g7  q3  b2  zz  gl  az  tre;
       cards;
    1   1   2   1   1   1   2   1   1
    2   2   3   3   2   2   2   1   1
    3   1   2   3   3   2   1   3   3
    4   3   1   2   2   3   2   1   1
    5   2   1   2   2   1   2   3   3
    6   3   1   1   2   2   1   3   3
    ;;;;
       run;
    proc print;
       run;
    data manymiss;
       if 0 then set nomiss;
       set nomiss(keep=id b2 tre:);
       run;
    proc print;
       run;

enter image description here

data _null_
  • 8,534
  • 12
  • 14
  • 2
    Good answer. It would be better though if you explained why it works - I suspect this would be hard for a lot of midlevel programmers to understand just looking at the code (as to why `if 0` works). – Joe Feb 04 '16 at 22:14
3

Another fairly simple option is to set them missing using a macro, and basic code writing techniques.

For example, let's say we have a macro:

%call_missing(var=);
  call missing(&var.);
%mend call_missing;

Now we can write a query that uses dictionary.columns to identify the variables we want set to missing:

proc sql;
  select name 
    from dictionary.columns
    where libname='WORK' and memname='HAVE'
    and not (name in ('ID','B2','TRE'));  *note UPCASE for all these;
quit;

Now, we can combine these two things to get a macro variable containing code we want, and use that:

proc sql;
  select cats('%call_missing(var=',name ,')')
    into :misslist separated by ' '
    from dictionary.columns
    where libname='WORK' and memname='HAVE'
    and not (name in ('ID','B2','TRE'));  *note UPCASE for all these;
quit;

data want;
  set have;
  &misslist.;
run;

This has the advantage that it doesn't care about the variable types, nor the order. It has the disadvantage that it's somewhat more code, but it shouldn't be particularly long.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • thanks for this. I had to use this code snippet - but to make it work I had to replace "dictionary.column" with "sashelp.vcolumn". – user2450223 Oct 27 '22 at 17:46
  • Yep, those are identical (just `dictionary.column` is available only in PROC SQL, while `sashelp.vcolumn` is available in the data step and elsewhere.) – Joe Oct 27 '22 at 18:40
1

If the variables are all of the same type (numeric or character) then you could use an array.

data want ;
  set have;
  array _all_ _numeric_ ;
  do over _all_;
     if upcase(vname(_all_)) not in ('ID','B2') then _all_=.;
  end;
run;

If you don't care about the order then just drop the variables and add them back on with 0 observations.

data want;
   set have (keep=ID B2 TRE:) have (obs=0 drop=ID B2 TRE:);
run;
Tom
  • 47,574
  • 2
  • 16
  • 29
  • That's useful - thanks! Is it much more complicated if they're not? – J.Q Feb 02 '16 at 20:31
  • @toandthrough If they're not, then you can't put them in one array - you'd have to make two arrays. – Joe Feb 02 '16 at 21:12
  • Tom, I wouldn't recommend naming an array `_all_`, it seems likely to confuse things with the automatic variable list `_all_` (if SAS would even allow it?) – Joe Feb 02 '16 at 21:13
  • Also - it's probably worth noting that you're doing the inverse of what OP asked in terms of which variables are which. Hopefully obvious and doesn't require much of a change, but worth pointing out. – Joe Feb 02 '16 at 21:14
  • That's coding for maximum job security. :-) Of course most of the programmers I work with don't even know what \_ALL\_ is. – data _null_ Feb 02 '16 at 21:34