1

Suppose I'm subsetting a table and summarizing it in proc sql. The code uses a where ... in clause and a subquery to do the subsetting. I know that some SQL engines would set some limit on the number of arguments to the where ... in clause. Does SAS has have limit on this? This question would apply to a program like this:

proc sql;
    create table want as
    select
        ID,
        sum(var1) as var1,
        sum(var2) as var2,
        sum(var3) as var3
    from largetable
    where ID in (select ID from longlist)
    group by ID;
quit;

What if longlist returns 10,000 IDs? How about 10,000,000?

Joe
  • 62,789
  • 6
  • 49
  • 67
Jeff
  • 1,787
  • 9
  • 14

2 Answers2

4

As Joe has said, there should probably be no problems with any reasonable number of rows in the longlist table. However, although this may be readable, a join may perform better.

Do you have a strong preference for running the query as written rather than doing a left join, e.g.

proc sql;
    create table want as
    select
        b.ID,
        sum(b.var1) as var1,
        sum(b.var2) as var2,
        sum(b.var3) as var3
    from longlist a left join largetable b
     on a.ID = b.ID
     group by b.ID;
quit;

Elaborating a bit on entering a long list as text - I'm not aware of any limit on the length of any one statement in SAS, but there are various limits on the length of individual lines of code, depending on your version and how you're submitting it. I suspect it's possible to split a long statement over several lines each approaching the maximum allowed length.

Community
  • 1
  • 1
user667489
  • 9,501
  • 2
  • 24
  • 35
  • No strong preference. This question was not really about how to do something, but just to confirm this particular syntax wasn't dangerous. Thanks for info! – Jeff Jan 24 '15 at 14:39
4

I'm not aware of any explicit limit on this. SAS's SQL parser seems to convert these often to JOINs, when they're not explicitly coded in the table; that means there are some limitations, but not particularly small ones.

I do believe there is a limit to the length of a SQL statement in total, so if you were trying to include an extremely long list in text you might run into problems, but in the example above I don't see a problem with 10,000,000 IDs. I just tested it with 250,000,000 IDs in the longlist table, and SAS had no problem with it:

data largetable;
  do id=1 to 1e8;
    if mod(id,7)=0 then output;
  end;
run;

data ids;
  do id = 1 to 1e9;
    if mod(id,4)=0 then output;
  end;
run;

proc sql _method;
    create table want as
    select
        ID
    from largetable
    where ID in (select ID from IDs)
    group by ID;
quit;

Interestingly, adding _method indicates it does not do this as a join, but as a subquery. I'm not sure why, at least in this case; everything I've been told says that it should convert this to a join implicitly.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • I expect that if you connected from SAS to another rdbms it might well be auto-converted. What led you to expect SAS itself to do that? – user667489 Jan 24 '15 at 11:49
  • @user667489 SAS `proc sql` is a SQL optimizer/interpreter/compiler just like any other RDBMS; SQL of any sort is a declarative language, after all, so the interpreter will do various things to optimize the query. SAS's optimizer is not as good as SQL Server's, but it's still pretty good. Converting a subquery `in` to a join is a very common optimization in any flavor of SQL. – Joe Jan 26 '15 at 03:32