Indexing variables in R

Question

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.

Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done: v[1]:=some vector,

and the nth element is then called by the command v[1][n]. How can this be done in R? The actual problem is as follows:

I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this

sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
                 x[k]<-rep(NA,M[k])
                 X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
                  if(x[k][i]>=0 & x[i]<=0.1056379){
                    X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
  else{
    X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
  }
 }
}

The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?

Thanks a lot :)

`[` is used for indexing, so `x[i]` extracts the `i`th element from a vector `x`. So `x[k]` is indeed not a valid variable name. In order to help you, a [reproducible example](http://stackoverflow.com/q/5963269/4303162) would be very useful. It seems that your example code would be reproducible, if you provided `eks_2016_kasko` and `rnegbin()` or `M`. — Stibu, Feb 23 '16 at 11:22
eks_2016_kasko=486689.1. Correct, rnegbin is indeed from the MASS package :) — user128836, Feb 23 '16 at 11:31

Glen Moutrie · Answer 1 · 2016-02-23T12:02:49.030

I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.

require(MASS)
sims<-10

# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1

# Create a list
x <- list()
X <- list()
for(k in 1:sims){
    x[[k]]<-rep(NA,M[k])
    X[[k]]<-rep(NA,M[k])
    for(i in 1:M[k]){
        x[[k]][i]<-runif(1,min=0,max=1)
    if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
        X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
    else{
        X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
    }
    }

This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.

Hope this helps!

Stibu · Answer 2 · 2016-02-24T11:00:10.030

Let me start with a few remarks and then show you, how your problem can be solved using R.

In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
```
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
  x1[i] <- runif(1, 0, 1)
}
```
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
```
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
```
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.

So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.

# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)

# define the function that calculates X for a single value from M
calculate_X <- function(m) {
  x <- runif(m, min=0,max=1)
  X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
              rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)

As you can see, there are no loops in that solution. I'll start to explain at the end:

lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Actually, your code gives vectors of the correct length, but each entry takes only one of two different values , e.g. for X[[3]][3]=X[[3]][9]=X[[3]][550]=42775.2. So it looks as if it uses the same simulation again and again. — user128836, Feb 24 '16 at 10:46
You are right! In the second call of `rlnorm` I had written a `1` instead of `m`. It's corrected now. Sorry about that! — Stibu, Feb 24 '16 at 11:01

Indexing variables in R

2 Answers2