3

I’d like to ask you a question again if you have some time.

I present to you my usual df dataframe that I already used before in my previous quesitons as a converted, simplified version of my real df dataframe which would be too difficult to show here. However the main characteristics are still the same.

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,4)
df <-data.frame(id,a,b,c,d,e)
df
df.list <- lapply(df[,2:6],function(x, id){ t(table(x, id, useNA = "ifany")) }, df$id)
df.list

You see, basically what I’ve created here is a collection of the total sum of occurrences of each different number appearing in columns ’a’ to ’e’ and groupped by the ids in column id at the same time.

In the next step I created a loop which looks like the follows:

for (i in names(df.list))
{
  df.list[i]
  assign( paste("var",i,sep=""),
          (matrix(matrix(unlist(df.list[i])),ncol=nlevels(factor(df[,i])),nrow=3))/10
        )
}

It divides each and every element of the list created before by 10. However, it is just the first half of my whole loop which I wanted to implement but at least it STILL works fine and properly, I have no problem with it, just send these codes into R here ->

vara
varb
varc
vard
vare

Now the more difficult part comes now when I'm trying to perform the "for (k in 1:3)" section. So, let's try to send these lines again (and also the previous half part of the whole loop of course).

for (i in names(df.list))
{
  df.list[i]
  assign( paste("var",i,sep=""),
          (matrix(matrix(unlist(df.list[i])),ncol=nlevels(factor(df[,i])),nrow=3))/10
        )

  for (k in 1:3)
    assign( paste("var",i,k,sep="."),
            vari[k,]*5 
          )
}

My problem is at the vari[k,]*5 line. (In real I ought to sort out a matrix multiplication at this point.) The code does not recognize vari however I already defined i before. And I do not intend to use vara, varb, varc... etc. because I need this to be automated. The reason: I'm gonna have to refresh my real df dataframe on a regular basis hence the number of variables might change over time (I will not necessarily have variables only from a to e, rather a to f or a to y etc.

So I get the following error message:

Error in assign(paste("var", i, k, sep = "."), vari[k, ] * 5): object 'vari' not found

What do I miss/do wrong here? I just want to refer to another object I already created in the same loop but still can’t recognize. Is there a proper solution here?

Thanks very much

tim_yates
  • 167,322
  • 27
  • 342
  • 338
Laszlo
  • 251
  • 1
  • 4
  • 13

3 Answers3

1

I think you can replace

vari[k,]*5 

with

get( paste( "var", i, sep="" ) )*5 

Do you really need to create varaibles in this way though? I worry about the namespace getting out of hand if your data set gets any larger. It might be better to just create a list object, or define your own environment with new.env and set the variables in this environment instead of the global one?

tim_yates
  • 167,322
  • 27
  • 342
  • 338
  • Thanks Tim! You are right, I am also a little bit afraid of the size of namespace getting out of control but still have not found a better solution yet. So I give this "new.env" thing a try, sounds great as a matter of fact. However this is a field I do not know much about yet... Until I dig a little bit deeper in the depths of "new.env" your suggestion "get" will also be perfect, so thank you. – Laszlo Mar 22 '11 at 13:09
  • No worries! :) Yeah, you can create an environment object with `new.env`, then pass it as an `envir=` parameter to `assign`, `get`, `ls`, etc... Have fun! – tim_yates Mar 22 '11 at 13:15
  • 4
    Just use a list. No need to mess around with environments. – hadley Mar 22 '11 at 14:17
1

@hadley I'd agree. From what I've seen vectorizing loops is almost always the right answer.

@lazlo Have a look at these examples: Vectorizing a loop and Coding the R-ight way - avoiding the for loop

Community
  • 1
  • 1
Rob
  • 834
  • 1
  • 10
  • 15
1

vari is indeed not recognized, as you saved a vara, varb, varc, vard, ... but not a vari. the i in the name is NOT changed to a number here!

What you want to achieve, can easily be done by :

lapply(df.list,function(i) i/10*5)

I presume this is just an example, and your actual code is more complex. But still, just use lapply and keep in mind that a table IS a matrix. All that unlist/matrix stuff is completely unnecessary.

> is.matrix(df.list[[1]])
[1] TRUE

If you really, really want to drop the table attributes, and you want to give the specified names, then your code can be simplified to :

VarList <- sapply(names(df.list),function(i){
  out <- df.list[[i]]/10*5

  out <- matrix(out,ncol(out)) # in case you want to drop all table attributes

  colnames(out) <- paste(
                     paste("var",i,sep=""),
                     1:ncol(out),
                     sep=".")
  out
},USE.NAMES=TRUE,simplify=FALSE)

Which gives you a list of matrices where the variable names are formed as you want. This also allows you to do something like

> VarList[["d"]][,1:2]
     vard.1 vard.2
[1,]    1.0    1.5
[2,]    1.0    3.0
[3,]    2.5    1.5

which essentially allows you to select the vars on number as an index, and the matrix just by the name of the initial variable. Stay with that, assigning to the global environment and trusting on names is wickedly dangerous.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263