2

I'm a long time SAS programmer looking to make the jump to R. I know R isn't all that great for variable re-coding but is there a way to do this with do loops.

If I have a lot of variables named a_1 a_2...a_100, b_1 b_2 ... b_100 and I want to create new variables c_1 c_2 ... c_100 where c_i = a_i + b_i. Is there a way to do this without 100 statements?

In SAS I would simply use:

%do i=1 %to 100;
c_&i = a_&i + b_&i;
%end;

Thanks!

user1346861
  • 31
  • 1
  • 2
  • 6
    The answer (which I'm sure will be illustrated below shortly) is to not use free floating variables in R. Put related items in a single data structure like a matrix or a data frame. – joran Apr 20 '12 at 15:13
  • R FAQ 7.21: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f (plus `?get`) – Ben Bolker Apr 20 '12 at 15:59
  • What class are these variables in SAS? Please add to your question that kind of 'stuff' a_1 contains. – John Apr 20 '12 at 18:10
  • 5
    This does not make sense to me: "R isn't all that great for variable re-coding". Either I am unaware of some facility that R should have, or else I see no difficulty in doing this. Can someone enlighten us as to what great variable recoding is like? – Iterator Apr 21 '12 at 11:58

6 Answers6

22

SAS uses a rudimentary macro language, which depends on text replacement rather than evaluation of expressions like any proper programming language. Your SAS files are essentially two things: SAS commands, and Macro expressions (things starting with '%'). Macro languages are highly problematic and hard to debug (for example, do expressions within expressions get expanded? Why do you have to do "&&x" or even "&&&x"? Why do you need two semicolons here?). It's clunky, and inelegant compared to a well-designed programming language that is based on a single syntax.

If your a_i variables are single numbers, then you should have made them as a vector - e.g:

> a = 1:100
> b = runif(100)

Now I can get elements easy:

> a[1]

and add up in parallel:

> c = a + b

You could do it with a loop, initialising c first:

> c = rep(0,100)
> for(i in 1:100){
   c[i]=a[i]+b[i]
   }

But that would be sloooooow.

Nearly every R beginner asks 'how do I create a variable a_i for some values of i', and then shortly afterwards they ask how to access variable a_i for some values of i. The answer is always to make a as either a vector or a list.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
7

This stuff is trivial. To me, it looks like you want to find a way to create commands automatically and execute them. Easy peasy.

For instance, this assigns to C_i the value in A_i:

for(i in 1:100){
    tmpCmd = paste("C_",i,"= A_",i, sep = "")
    eval(parse(text = tmpCmd))
}
rm(i, tmpCmd)

Just remember eval(parse(text = ...))) and paste(), and you're off to the races in creating loops of commands to execute.

You can then add in the operation you'd like to do, i.e. the summation with B_i, by swapping in this line:

    tmpCmd = paste("C_",i,"= A_",i," + B_",i, sep = "")

However, others are right that using good data structures is a way to avoid having to do a lot of tedious things like this. Yet, when you need to, such repetitive code isn't hard to devise.

Iterator
  • 20,250
  • 12
  • 75
  • 111
6

I suspect that if you have one hundred variables a_1, a_2, ..., a_100, all of your variables are related. In fact, if you want to do

c_1 = a_1 + b_1

then a, b, c are related. Therefore, I recommend that you combine all of your variables into a single data frame, where one column is a and another is b.

The question is how do you combine your variables in a sensible way. However, to give a useful answer, can you tell us how these variables are created?


Perhaps this isn't suitable, for your case. If not, a bit more information would be useful.

csgillespie
  • 59,189
  • 14
  • 150
  • 185
  • I think they're trying to avoid doing exactly this 100 times. – joran Apr 20 '12 at 15:09
  • I think the OP has defined 100 individual vectors, a_1, etc. and is hoping to avoid typing a_1 + b_1, etc 100 times. – joran Apr 20 '12 at 15:11
2

This is actually a pretty interesting question. From my reading and recent (forced) use of SAS, the question seems to be trying to recode variables in a SAS dataset within a data step using a bit of macro code. Otherwise if they were free variables being created they would start with a & character. I think the example code would actually be better represented like:

%macro recodevars;
data test;
  set test;

  %do i=1 %to 100;
  c_&i = a_&i + b_&i;
  %end;

run;
%mend recodevars;
%recodevars;

You could do something similar in R like this example:

test <- data.frame(vara1=1:10,varb1=2:11,vara2=3:12,varb2=4:13)

test[paste0("varc",1:2)] <- test[paste0("vara",1:2)] + test[paste0("varb",1:2)]

I'd be curious to know what insight others have to answer the question if it is applied to a dataframe and not free variables.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • This is probably the most R-ish way of doing things, actually. (You could reduce the amount of text munging by using lists, but the basic idea would be the same.) It's generally a good idea to avoid eval/parse tricks unless absolutely necessary, and as you've shown, in this case they're not necessary. – Hong Ooi Apr 23 '12 at 06:06
2

This is really late, but you can actually do this without loops or *apply. I'm assuming that the variables are columns in a data frame (which makes sense if the OP is familiar with SAS datasets and macros).

df[paste("c", 1:100, sep="_")] <- df[paste("a", 1:100, sep="_")] +
                                  df[paste("b", 1:100, sep="_")]
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • This should be the accepted answer as it is by far the simplest example here, and gives an identical result as the SAS code would. – thelatemail May 27 '14 at 05:56
1

The R way would be to use lists.

> a_1 = 1
> a_2 = 2
> a_3 = 3
> a_4 = 4
> a_5 = 5

> b_1 = 1
> b_2 = 2
> b_3 = 3
> b_4 = 4
> b_5 = 5

> a.list <- ls(patter='a_*')
> a.list
[1] "a_1" "a_2" "a_3" "a_4" "a_5"

and define blist as well.

if(length(a.list)==length(b.list)){
   c.list <- lapply(1:length(a.list), function(x) eval(parse(text=a.list[x])) + eval(parse(text=b.list[x])))

   c.list.names <- paste('c', 1:length(a.list), sep='_')

   lapply(1:length(c.list), function(x) assign(c.list.names[x], c.list[x], envir=.GlobalEnv)) 
}

I can't think of a way to do this without the eval(parse(yuk)) and assign unless you follow csgillespie's advice (which is the right way!)

Justin
  • 42,475
  • 9
  • 93
  • 111