26

Can anyone tell me what is the fastest way to programmatically convert a SAS dataset into a CSV file. I know I can use a data step and output to a file etc. But is that the only way?

Thanks, Adnan.

Adnan
  • 2,949
  • 4
  • 29
  • 45

4 Answers4

35

something along these lines?

proc export data=sashelp.class
    outfile='c:\temp\sashelp class.csv'
    dbms=csv
    replace;
run;
rkoopmann
  • 1,136
  • 7
  • 10
  • Yes that was one of the options we'd discussed. Do you know how it scales vs the data step approach? – Adnan Sep 15 '10 at 23:40
  • 4
    based on the log, proc export generates a data null step to write out the text file. so i'd guess using a data step from the start would be more efficient in terms of cpu time. data step would give you more control over the output. – rkoopmann Sep 16 '10 at 02:10
  • 1
    I'm probably going to end up using a data _null_ anyway because I'd like to have control over which columns get exported, and things iike formatting/not formatting column values etc. – Adnan Sep 23 '10 at 22:00
  • 1
    FYI I've noticed that I have to use the -noterminal option when running a SAS program (in batch mode) that uses proc export to create a csv or other flat file. – Banjer Nov 09 '10 at 21:17
11

5 different ways to create .CSV file of a SAS dataset.

refer http://studysas.blogspot.com/2009/02/how-to-create-comma-separated-file-csv.html

Triad sou.
  • 2,969
  • 3
  • 23
  • 27
user466405
  • 127
  • 2
  • You missed the `%ds2csv` macro: http://support.sas.com/documentation/cdl/en/lebaseutilref/64791/HTML/default/viewer.htm#n0yo3bszlrh0byn1j4fxh4ndei8u.htm – Robert Penridge Apr 25 '18 at 17:00
8

The answers here have addressed many ways to create a csv, but not necessarily the efficiency of these methods. I create a sample dataset (temp) with 100,000 rows and five columns to test and compare the methods for export the sas dataset to a csv named temp.csv.


The first method: The Export Procedure. Run time: 0.43 seconds.

PROC EXPORT data=temp 
     outfile="temp.csv" dbms=csv replace;

Quick run time and simple, in-the-box flexibility when choosing other options. With that said, it is not the strongest for customization


The second method: ODS with Print Procedure. Run time: 14.09 seconds.

ODS csv file="temp.csv";
PROC PRINT data=temp noobs;
RUN;
ods csv close;

This method is the worst option out of the three for most use cases, although there a few special use cases. It is nice for temporary output of previously written procedures, especially if you want the output to stay in the lst file (if it is not too big). It also may be useful when you want to convert another procedure (for instance, a complicated tabulate) to file without further manipulation. If you do not need the print out in the lst file, close your listing (ods listing close) or this will take much, much longer.


The third method: File Statement. Run time: 0.06 seconds.

DATA _null_;
    FILE "temp.csv ";
    SET temp;
    put (_all_) (',');
RUN;

While the performance of this method is not bad, it is not intuitive and looks confusing. As mentioned above however, you would have more control over the output, and it has the quickest run time of them all.

cacti5
  • 2,006
  • 2
  • 25
  • 33
  • 3
    Caution: The data _null_ method will not output the variable names in the first row of your .csv, which might be a problem. – Ketil Tveiten Oct 24 '18 at 07:45
  • 1
    You should use the DSD option on the FILE statement instead of writing the delimiter by hand. Otherwise you might not get a file that can be deciphered. Note that it is not very hard to add a little code to write the header row. – Tom Apr 12 '19 at 04:24
  • @Tom I feel like there's another answer on its way...? – mjsqu Apr 12 '19 at 04:47
3

Modified data _NULL_ approach

Here's a modification to the data _NULL_ approach that writes out header rows to the file in the first datastep and then continues to write the same file with the data rows in the second datastep.

%macro outputCSV(dataset,file);
data _NULL_;
   file "&file." dlm=',' dsd;
   length header $ 2000;
   dsid=open("&dataset.","i");
   num=attrn(dsid,"nvars");
   do i=1 to num;
      header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
      put header @;
   end;
   rc=close(dsid);
run;

data _NULL_;
   set &dataset.;
   file "&file." mod dlm=',' dsd;
   put (_all_) (+0);
run;
%mend;

%outputCSV(sashelp.class,~/temp4.csv)

It can be made to work with keep and drop data set options (surprisingly), although unfortunately when a dataset is 'opened' using open(), the nvars attribute respects the new number of variables after keep or drop, but the varlabel and varname functions still look for variables using their varnum.

In the following example, just Name (varnum=1) and Height (varnum=4) are extracted from SASHELP.CLASS, because only two variables are kept, nvars is 2, but if we iterate over a loop with num as the upper bound, we miss out Height, because its varnum is 4:

 62         data _NULL_;
 63           dsid = open("sashelp.class (keep=name height)","i");
 64           num = attrn(dsid,"nvars");
 65           do i=1 to 5;
 66             vname = varname(dsid,i);
 67             put i= vname= num=;
 68           end;
 69         run;
 
 i=1 vname=Name num=2
 NOTE: Argument 2 to function VARNAME(1,2) at line 66 column 13 is invalid.
 i=2 vname=  num=2
 NOTE: Argument 2 to function VARNAME(1,3) at line 66 column 13 is invalid.
 i=3 vname=  num=2
 i=4 vname=Height num=2
 NOTE: Argument 2 to function VARNAME(1,5) at line 66 column 13 is invalid.
 i=5 vname=  num=2
 dsid=1 num=2 i=6 vname=  _ERROR_=1 _N_=1

There are two options:

  • Extract the true nvars value by stripping off the dataset options for the first pass
  • Use a really high number instead of num, although this just leads to lots of 'invalid' notes in the log

This is the first approach, requiring two calls to the open function:

%macro outputCSV(dataset,file);
data _NULL_;
   file "&file." dlm=',' dsd;
   length header $ 2000;
   dsid=open("%SCAN(&dataset.,1,()","i");
   num=attrn(dsid,"nvars");
   rc=close(dsid);
   dsid=open("&dataset.","i");
   do i=1 to num;
      header = trim(left(coalescec(varlabel(dsid,i),varname(dsid,i))));
      if _error_ = 0 then put header @;
      _error_ = 0;
   end;
   rc=close(dsid);
run;

data _NULL_;
   set &dataset.;
   file "&file." mod dlm=',' dsd;
   put (_all_) (+0);
run;
%mend;

%outputCSV(sashelp.class (keep=name height),~/temp4.csv)

After writing all of that out, it probably makes sense to go with PROC EXPORT in most cases, or explicitly list the variables if there aren't many. This just shows what can be done.

Community
  • 1
  • 1
mjsqu
  • 5,151
  • 1
  • 17
  • 21
  • 1
    No need to use those old SCL functions. Just use PROC TRANSPOSE to create a dataset with the variable names in order. `proc transpose data=have(obs=0) out=name; var _all_; run;` Then use a simple data step to write the header row. Then add the MOD option to the file statement in the data step that writes the actual data lines. – Tom Apr 12 '19 at 04:28
  • Yes, good call, had attempted similar using `array _character_` and its counterpart, but couldn't get the columns to come out in order. – mjsqu Apr 12 '19 at 04:45
  • Thanks @mjsqu! This is a great answer - I incorporated your code into our macro library, I hope that's ok: https://core.sasjs.io/mp__ds2csv_8sas.html – Allan Bowe Dec 16 '20 at 16:44