Generate some simple dummy data in R

Question

I just want some random data to experiment with different prediction models.

My code:

x <- 0

for (i in 1:200)
{
    num <- runif(1, 0, 500)
    neg <- round(runif(5, -1, 0))
    percent <- ((0.01 * runif(1, 1, 10)) * num)

    x[i] = num + (neg * percent)
}

The idea is that this should generate 200 points.

num is a random number between 0 and 500

neg is either -1 or 1, just to add some flexibility to the random offset (negative or positive offset of a randomly generated point)

percent is just a random percentage between 1% and 10% of the originally generated random number to either be added or subtracted

Very similar code that I've made in my main language, C#, works very well and generates proper plots. I'm more-or-less trying to port that code.

Whenever I run the above, I get the following errors (a lot of them):

number of items to replace is not a multiple of replacement length

It's triggered on the last line of code in the for loop.

I'd love to be able to fix this. Any help is appreciated. Thank you!

`neg` has length 5 because you supplied 5 to its `n` argument. Therefore you are trying to add `num` of length 1 with `neg` of length 5 — Chrisss, Sep 25 '16 at 03:04
@Chrisss How could I solve this? Is `<-` exlusively an add operator? `=` provides the same errors — Dan, Sep 25 '16 at 03:09
If `neg` is either -1 or 1 then you should do: `neg = round(runif(200,0,1)) * 2 - 1` because your expression will return numbers -1 or 0 — R. Schifini, Sep 25 '16 at 03:17
@KingDan, `<-` is an assignment operator and is similar to the function `assign`. `=` is also an assignment operator, however, it's "powers" are more limited. Read this http://stackoverflow.com/questions/2271575/whats-the-difference-between-and-in-r — Jacob H, Sep 25 '16 at 03:23

Jacob H · Accepted Answer · 2016-09-25T03:15:59.457

4

Chrisss has already pointed out your problem in his comment. However, you're doing a lot of bad things from an R programming prospective. The following approach is better:

N <- 200

d <- data.frame(x = rep(NA, N))

num <- runif(N, 0, 500)
neg <- sample(c(1,-1), 200, replace = TRUE) #jrdnmdhl pointed this out in his post
percent <- ((0.01 * runif(N, 1, 10)) * num)
d$x <- num + (neg * percent)

Why is this better? Two reasons, we are avoiding a for loop. R is a high-level language, and therefore, loops are slow. Second, you are not preallocating your memory. Skipping this step will slow things down as well. R has to go find more memory for each iteration in your example.

A great resource is Hadley Wickham's Advanced R, to learn more about the first and second reason, read this and that

edited Sep 25 '16 at 03:15

answered Sep 25 '16 at 03:10

Jacob H

4,317
2
32
39

1

Just do d <- data.frame(x = num * (neg + percent)) in the end. There is no need to preallocate if you don't use a loop. – Roland Sep 25 '16 at 07:23
Thanks for the answer - What does `NA` signify in R, as per your 2nd line? – Dan Sep 26 '16 at 01:45
@KingDan the `NA` represents a missing value. More specifically, it is a logical constant which represents neither `TRUE` nor `FALSE`. It is common to use `NA` to preallocate memory, because presumably, when you modify the object you will be replacing `NA` with actual data. You could, however, have preallocated memory using anything for example `9999`. For more on `NA` read this https://www.r-bloggers.com/r-na-vs-null/ – Jacob H Sep 27 '16 at 20:06

score 2 · Answer 2 · answered Sep 25 '16 at 03:14

The commenter mentioned the main problem, but your code would be much faster if vectorized. Also, your description of 'neg' is not consistent with what it is doing. Your code doesn't generate either -1 or 1. Instead, it generates either -1 or 0. The code below will generate either -1 or 1 for the neg variable.

num = runif(200, 0, 500)
neg = sample(c(1,-1),200,replace=T)
percent = ((0.01 * runif(200, 1, 10)) * num)
x = num + (neg * percent)

Generate some simple dummy data in R

2 Answers2