Generate a set of random unique integers from an interval

Question

I am trying to build some machine learning models,

so I need training data and a validation data

so suppose I have N number of examples, I want to select random x examples in a data frame.

For example, suppose I have 100 examples, and I need 10 random numbers, is there a way (to efficiently) generate 10 random INTEGER numbers for me to extract the training data out of my sample data?

I tried using a while loop, and slowly change the repeated numbers, but the running time is not very ideal, so I am looking for a more efficient way to do it.

Can anyone help, please?

Konrad Rudolph · Answer 1 · 2018-10-11T15:42:39.823

80

sample (or sample.int) does this:

sample.int(100, 10)
# [1] 58 83 54 68 53  4 71 11 75 90

will generate ten random numbers from the range 1–100. You probably want replace = TRUE, which samples with replacing:

sample.int(20, 10, replace = TRUE)
# [1] 10  2 11 13  9  9  3 13  3 17

More generally, sample samples n observations from a vector of arbitrary values.

edited Oct 11 '18 at 15:42

answered Jul 21 '13 at 13:59

Konrad Rudolph

530,221
131
937
1,214

1

thanks! let me try out your solution - no, i need my training data to be unique, but thanks for the additional information!! – Low Yi Xiang Jul 21 '13 at 14:03
2

Also, @LowYiXiang, you might find `head` and `tail` useful here: `idx <- sample.int(100); train.idx <- head(idx, 10); test.idx <- tail(idx, -10);` – flodel Jul 21 '13 at 14:13

score 3 · Answer 2 · answered Jul 21 '13 at 19:42

If I understand correctly, you are trying to create a hold-out sampling. This is usually done using probabilities. So if you have n.rows samples and want a fraction of training.fraction to be used for training, you may do something like this:

select.training <- runif(n=n.rows) < training.fraction
data.training <- my.data[select.training, ]
data.testing <- my.data[!select.training, ]

If you want to specify EXACT number of training cases, you may do something like:

indices.training <- sample(x=seq(n.rows), size=training.size, replace=FALSE) #replace=FALSE makes sure the indices are unique
data.training <- my.data[indices.training, ]
data.testing <- my.data[-indices.training, ] #note that index negation means "take everything except for those"

score 0 · Answer 3 · edited Feb 23 '19 at 19:21

0

from the raster package:

raster::sampleInt(242, 10, replace = FALSE)
##  95 230 148 183  38  98 137 110 188  39

This may fail if the limits are too large:

sample.int(1e+12, 10)

edited Feb 23 '19 at 19:21

PatrickT

10,037
9
76
111

answered Feb 01 '19 at 03:26

Barry

13
1

Generate a set of random unique integers from an interval

3 Answers3

Linked