4

I'm trying to create some simulated data. To create clustered data, I have assigned whether prescribers work in one or more than one local health area (LHA). Now, I am trying to assign a prescriber for a patient based on their LHA. The code for that is in the following codeblock.

for (i in seq_along(data$LHA)) {
  data$prescriber_id[i] <- sample(x = number_of_LHAs_worked$prescriber_id[
    number_of_LHAs_worked$assigned_LHAs_2 == data$LHA[i]], 
                                  size = 1)
}

This loop works well for prescribers in more than one LHA (i.e. length of x given to the sample function is larger than 1. However, it fails when a prescribers works in only one LHA due to the behaviour of the sample function.

sample(x = 154, size = 1) 

When given only one number for x, R creates an index from 1 to x, and then randomly chooses a number in this range.

While I've worked out a solution for my purposes; I'm interested in seeing whether others have figured out ways to make the sample function work more consistently. Specifically, force the sample function to only use the set specified.

sample(x = 154:155, size = 1)    # here the function chooses only a number in the set {154, 155}. 
alistaire
  • 42,459
  • 4
  • 77
  • 117
  • So what is the question precisely? It is true if you input just a number in sample then r creates a vector from 1 to the number you did input. The reason being you cannot sample from a number. That is not sampling. sampling involves probability and if it is just one number then that is a sure event – Onyambu Dec 24 '17 at 01:30
  • 1
    @Onyambu If you will allow it, let's just consider it sampling from a set where one item has a 100% probability of selection. A 100% probability for selection is still sampling (consider probabilistic sampling in surveys of hard-to-reach populations). My question is simply, I would like to make the sample function in R to treat x as a set, regardless of the length's set. Thank you for commenting. – Prateek Sharma Dec 24 '17 at 02:01

2 Answers2

2

?sample supplies an answer in its examples:

set.seed(47)

resample <- function(x, ...) x[sample.int(length(x), ...)]

# infers 100 means 1:100
sample(100, 1)
#> [1] 98

# stricter
resample(100, 1)
#> [1] 100

# still works normally if explicit
resample(1:100, 1)
#> [1] 77
alistaire
  • 42,459
  • 4
  • 77
  • 117
0

You can also use resample() from the gdata package. This saves you having to redefine resample in each new script. Just call

gdata::resample(x = 154, size = 1)

https://www.rdocumentation.org/packages/gdata/versions/2.18.0/topics/resample

trickytank
  • 46
  • 4