20

I wonder if there is a way of detecting whether a data set is empty, i.e. it has no observations. Or in another saying, how to get the number of observations in a specific data set.

So that I can write an If statement to set some conditions.

Thanks.

Robert Penridge
  • 8,424
  • 2
  • 34
  • 55
mj023119
  • 675
  • 5
  • 12
  • 19

7 Answers7

18

It's easy with PROC SQL. Do a count and put the results in a macro variable.

proc sql noprint;
 select count(*) into :observations from library.dataset;
quit;
Laurent de Walick
  • 2,154
  • 14
  • 11
  • Thanks Laurent for the quick answer. It seems **select into** is very useful. – mj023119 Apr 15 '11 at 10:59
  • 5
    Hi Laurent - This is ok for small datasets but for large datasets it can take quite a long time to execute as it has to read through each observation instead of just looking at the table's metadata. See some of the other answers below for a more efficient approach. – Robert Penridge Oct 19 '11 at 18:12
  • 1
    `count(var_name)` reads through each observation, but `count(*)` does not. – Ryan Apr 10 '17 at 00:39
  • @Ryan It would be nice if so, but unfortunately that's incorrect. – Robert Penridge Nov 17 '17 at 20:05
15

There are lots of different ways, I tend to use a macro function with open() and attrn(). Below is a simple example that works great most of the time. If you are going to be dealing with data views or more complex situations like having a data set with records marked for deletion or active where clauses, then you might need more robust logic.

%macro nobs(ds);
    %let DSID=%sysfunc(OPEN(&ds.,IN));
    %let NOBS=%sysfunc(ATTRN(&DSID,NOBS));
    %let RC=%sysfunc(CLOSE(&DSID));
    &NOBS
%mend;

/* Here is an example */
%put %nobs(sashelp.class);
cmjohns
  • 4,465
  • 17
  • 21
  • Thanks all the same, Johns. I will try out later. – mj023119 Apr 15 '11 at 10:59
  • Not strictly relevant. But I sometimes think that it's too complicate a way. In R the relevant command is nrow(data.frame1) – xiaodai Dec 04 '14 at 22:16
  • @xiaodai That's why you should make use of either macro autocall libraries, or `proc fcmp`, or both. This then reduces the code to just the final part: `%nobs(sashelp.class)` – Robert Penridge Jul 20 '15 at 18:11
9

Here's the more complete example that @cmjohns was talking about. It will return 0 if it is empty, -1 if it is missing, and has options to handle deleted observations and where clauses (note that using a where clause can make the macro take a long time on very large datasets).

Usage Notes:

This macro will return the number of observations in a dataset. If the dataset does not exist then -1 will be returned. I would not recommend this for use with ODBC libnames, use it only against SAS tables.

Parameters:

  • iDs - The libname.dataset that you want to check.
  • iWhereClause (Optional) - A where clause to apply
  • iNobsType (Optional) - Either NOBS OR NLOBSF. See SASV9 documentation for descriptions.

Macro definition:

%macro nobs(iDs=, iWhereClause=1, iNobsType=nlobsf, iVerbose=1);
  %local dsid nObs rc;

  %if "&iWhereClause" eq "1" %then %do;
    %let dsID = %sysfunc(open(&iDs));
  %end;
  %else %do;
    %let dsID = %sysfunc(open(&iDs(where=(&iWhereClause))));
  %end;

  %if &dsID %then %do;
    %let nObs = %sysfunc(attrn(&dsID,nlobsf));
    %let rc   = %sysfunc(close(&dsID));
  %end;
  %else %do;
    %if &iVerbose %then %do;
      %put WARNING: MACRO.NOBS.SAS: %sysfunc(sysmsg());      
    %end;
    %let nObs  = -1;
  %end;
  &nObs
%mend;

Example Usage:

%put %nobs(iDs=sashelp.class);
%put %nobs(iDs=sashelp.class, iWhereClause=height gt 60);
%put %nobs(iDs=this_dataset_doesnt_exist);

Results

19
12
-1

Installation

I recommend setting up a SAS autocall library and placing this macro in your autocall location.

Robert Penridge
  • 8,424
  • 2
  • 34
  • 55
  • Hi Rob, I suppose you are an advanced user of SAS, too. I will try out when I understand the code. Thanks all the same, anyway. – mj023119 Apr 15 '11 at 11:00
  • Robert, would you know how to modify to also determine if there are neither obs nor vars? Returns 1 in that situation. I tried playing around with various attrn options but I can't find the way. – milcak Jul 20 '15 at 16:36
  • 1
    @milcak Sure, the code you need to test the numbers of variables is `%let nVars = %sysfunc(attrn(&dsID,nvars));`. This line of code would need to be executed where the `%let nobs` statement is running. I'll leave the rest of the logic to you to figure out as far as what value you'd like to return. – Robert Penridge Jul 20 '15 at 17:16
4

Proc sql is not efficient when we have large dataset. Though using ATTRN is good method but this can accomplish within base sas, here is the efficient solution that can give number of obs of even billions of rows just by reading one row:

data DS1;
set DS nobs=i;
if _N_ =2 then stop;
No_of_obs=i;
run;
Altumish
  • 69
  • 3
  • 3
    +1 for a solution that doesn't use those dreaded `%` signs, just be aware that there are cases where `nobs` won't work. http://www2.sas.com/proceedings/sugi26/p095-26.pdf – Hong Ooi Jun 29 '13 at 14:35
1

The trick is producing an output even when the dataset is empty.

data CountObs;

    i=1;
    set Dataset_to_Evaluate point=i nobs=j; * 'point' avoids review of full dataset*;
    No_of_obs=j;
    output;  * Produces a value before "stop" interrupts processing *;
    stop;   * Needed whenever 'point' is used *;
    keep No_of_obs;
run;

proc print data=CountObs;
run;

The above code is the simplest way I've found to produce the number of observations even when the dataset is empty. I've heard NOBS can be tricky, but the above can work for simple applications.

mmohab
  • 2,303
  • 4
  • 27
  • 43
  • You need to move the output statement BEFORE the set statement to allow it to handle input datasets with zero observations. – Tom Nov 10 '15 at 13:55
1

A slightly different approach:

proc contents data=library.dataset out=nobs;
run;

proc summary data=nobs nway;
class nobs;
var delobs;
output out=nobs_summ sum=;
run;

This will give you a dataset with one observation; the variable nobs has the value of number of observations in the dataset, even if it is 0.

1

I guess I am trying to reinvent the wheel here with so many answers already. But I do see some other methods trying to count from the actual dataset - this might take a long time for huge datasets. Here is a more efficient method:

proc sql;
select nlobs from sashelp.vtable where libname = "library" and memname="dataset";
quit;
enautic
  • 31
  • 5
  • 1
    You actually need to use NLOBS instead of NOBS. That variable takes into account the number of deleted observations (DELOBS). – Tom Sep 13 '16 at 14:12
  • Good addition. There's also `dictionary.tables` if you don't want to use `sashelp.vtable` as well. I can't remember the exact circumstance but once upon a time I had a situation where one of them took almost 30 seconds to run while the other was instant. E.g. `proc sql noprint; select nlobs from dictionary.tables where libname='SASHELP' and memname='CLASS'; quit;` – Robert Penridge Mar 23 '17 at 23:05