Can anyone tell me what is the fastest way to programmatically convert a SAS dataset into a CSV file. I know I can use a data step and output to a file etc. But is that the only way?
Thanks, Adnan.
something along these lines?
proc export data=sashelp.class
outfile='c:\temp\sashelp class.csv'
dbms=csv
replace;
run;
5 different ways to create .CSV file of a SAS dataset.
refer http://studysas.blogspot.com/2009/02/how-to-create-comma-separated-file-csv.html
The answers here have addressed many ways to create a csv, but not necessarily the efficiency of these methods. I create a sample dataset (temp
) with 100,000 rows and five columns to test and compare the methods for export the sas dataset to a csv named temp.csv
.
The first method: The Export Procedure. Run time: 0.43 seconds.
PROC EXPORT data=temp
outfile="temp.csv" dbms=csv replace;
Quick run time and simple, in-the-box flexibility when choosing other options. With that said, it is not the strongest for customization
The second method: ODS with Print Procedure. Run time: 14.09 seconds.
ODS csv file="temp.csv";
PROC PRINT data=temp noobs;
RUN;
ods csv close;
This method is the worst option out of the three for most use cases, although there a few special use cases. It is nice for temporary output of previously written procedures, especially if you want the output to stay in the lst file (if it is not too big). It also may be useful when you want to convert another procedure (for instance, a complicated tabulate
) to file without further manipulation. If you do not need the print out in the lst file, close your listing (ods listing close
) or this will take much, much longer.
The third method: File Statement. Run time: 0.06 seconds.
DATA _null_;
FILE "temp.csv ";
SET temp;
put (_all_) (',');
RUN;
While the performance of this method is not bad, it is not intuitive and looks confusing. As mentioned above however, you would have more control over the output, and it has the quickest run time of them all.
data _NULL_
approachHere's a modification to the data _NULL_
approach that writes out header rows to the file in the first datastep and then continues to write the same file with the data rows in the second datastep.
%macro outputCSV(dataset,file);
data _NULL_;
file "&file." dlm=',' dsd;
length header $ 2000;
dsid=open("&dataset.","i");
num=attrn(dsid,"nvars");
do i=1 to num;
header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
put header @;
end;
rc=close(dsid);
run;
data _NULL_;
set &dataset.;
file "&file." mod dlm=',' dsd;
put (_all_) (+0);
run;
%mend;
%outputCSV(sashelp.class,~/temp4.csv)
It can be made to work with keep
and drop
data set options (surprisingly), although unfortunately when a dataset is 'opened' using open()
, the nvars
attribute respects the new number of variables after keep
or drop
, but the varlabel
and varname
functions still look for variables using their varnum
.
In the following example, just Name
(varnum=1) and Height
(varnum=4) are extracted from SASHELP.CLASS
, because only two variables are kept, nvars
is 2, but if we iterate over a loop with num as the upper bound, we miss out Height
, because its varnum
is 4:
62 data _NULL_;
63 dsid = open("sashelp.class (keep=name height)","i");
64 num = attrn(dsid,"nvars");
65 do i=1 to 5;
66 vname = varname(dsid,i);
67 put i= vname= num=;
68 end;
69 run;
i=1 vname=Name num=2
NOTE: Argument 2 to function VARNAME(1,2) at line 66 column 13 is invalid.
i=2 vname= num=2
NOTE: Argument 2 to function VARNAME(1,3) at line 66 column 13 is invalid.
i=3 vname= num=2
i=4 vname=Height num=2
NOTE: Argument 2 to function VARNAME(1,5) at line 66 column 13 is invalid.
i=5 vname= num=2
dsid=1 num=2 i=6 vname= _ERROR_=1 _N_=1
There are two options:
nvars
value by stripping off the dataset options for the first passnum
, although this just leads to lots of 'invalid'
notes in the logThis is the first approach, requiring two calls to the open
function:
%macro outputCSV(dataset,file);
data _NULL_;
file "&file." dlm=',' dsd;
length header $ 2000;
dsid=open("%SCAN(&dataset.,1,()","i");
num=attrn(dsid,"nvars");
rc=close(dsid);
dsid=open("&dataset.","i");
do i=1 to num;
header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
if _error_ = 0 then put header @;
_error_ = 0;
end;
rc=close(dsid);
run;
data _NULL_;
set &dataset.;
file "&file." mod dlm=',' dsd;
put (_all_) (+0);
run;
%mend;
%outputCSV(sashelp.class (keep=name height),~/temp4.csv)
After writing all of that out, it probably makes sense to go with PROC EXPORT
in most cases, or explicitly list the variables if there aren't many. This just shows what can be done.