3

I have: 6 files, named as follows: "ROSTER2008", "ROSTER2009", ..., "ROSTER2013" Each file has these variables: TEAMID and MATE1, MATE2, ..., MATEX. The names of the teammates are stored for each team, through to teammate X.

Now: I want to loop through code that reads in the 6 files and creates one output file with TEAM, MATES2008, MATES2009, ..., MATES2013, where the MATES20XX variables contain the number of teammates on each team in that respective year (I no longer care about their names).

Here is what I've tried to do:

%macro sqlloop(start, end);
proc sql;
     %DO year=&start. %TO &end.;
         create table mates_by_team_&year. as 
         select distinct put(teamID, z2.) as team,
         count(*) as mates
         from myLibrary.rosters
         where d='ROSTER(HOW DO I PUT YEAR HERE?)'.d;
     %end;
quit;
%mend;

%sqlloop(start=2008, end=2013)

SO, here are my questions:

  1. How do I fix the 'where' statement such that it understands the file name I intend to pull in on each iteration? (That is ROSTER2008,...,ROSTER2013)

  2. Right now, I am creating 6 different tables for each file I input. How can I, instead, join these files into one table as I have described above?

Thank you in advance for your help!

Xtina

Xtina
  • 33
  • 4
  • My syntax might be a little off but you can place the 'year' variable right next to the string literal like so: `where d = 'ROSTER'&year.`. For combining the six files together, I'd simply let the macro create six datasets, then use a data step after the proc sql to merge them together. – Pat Jones Oct 13 '16 at 21:40

2 Answers2

2

For the sake of simplicity I have assumed that your X in MATEX never goes beyond 999. Obviously you can increase the limit or you can use proc contents or dictionary tables to find out the max limit.

I have pulled all the variables Mate1:MateX in an array and checking if they are missing or not and increasing the counter to count the number of team mates.

%macro loops(start=,end=);

%do year=&start. %to &end.;
data year_&year.(keep=teamid Mates_&year.);
set roster&year.;
array mates{*} mate1-mate999;
Mates_&year.=0;
do i=1 to 999;
if not missing(mates{i}) then Mates_&year.=Mates_&year.+1;
end;
run;

proc sort data=year_&year.;
by teamid;
%end;

At the end I have merged all the datasets together.

data team;
merge year_&start.-year_&end.;
by teamid;
run;

%mend;

%loops(start=2008,end=2009);
in_user
  • 1,948
  • 1
  • 15
  • 22
0

Assuming those are SAS datasets and your using SAS 9.3+ I'd recommend a data step and proc freq approach. It's infinitely easier to understand. If you wanted to make it a macro replace the years (2008/2012) with your macro variables.

Edit: Added N() to count number of teammates assuming it follows the mate1 mate2 ... matex. The single datastep would have the answer and you can drop the variables.

Data combined;
Set roster2008 - roster2012 indsname=source;

 Year = substr(scan(source, 2, '.'), 7);

 Num_mates = n(of mate:); 

 *drop mate:;

 Run;

 PROC sort data=combined;
 By teamid year;
 Run;

 PROC transpose data=combined out=want prefix=mate;
 By teamid;
 VAR num_mates;
  Id year;
  Run;

  *OLD CODE BASED ON SQL SOLUTION;
 *Proc freq data=combined;
    *Table dsn*teamid/list;
  *Run;
Reeza
  • 20,510
  • 4
  • 21
  • 38
  • Reeza, members of the same team are arranged horizontally and not vertically, so proc freq would not work. – in_user Oct 14 '16 at 03:36
  • Then the original SQL query was entirely off base. I did notice that after so the OP would have to clarify. – Reeza Oct 14 '16 at 05:09
  • @NEOmen I've modified my code based on that assumption. I'm still not a fan of a macro solution when it's easier to do in other ways. – Reeza Oct 14 '16 at 05:17