0

I need to create 100 independent samples such that each sample has single observation for a customer. There is column c_id which has repeated values. Each sample should have single record of the customer. This is the code which I tried

N_Sample<-100

for (s in 1:N_Sample){

 for (i in unique(data$cust_id)){

 k=sample(1:length(data$cust_id[data$cust_id==i]),1)

  }
}

Is there any other way in which we can create samples for above scenario. Also, there is column "Balance" in my data set. For each sample I need to calculate the 'Total Balance' then average of Total Balance for all 100 samples.

slava-kohut
  • 4,203
  • 1
  • 7
  • 24
VA25
  • 1
  • 3
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jul 17 '20 at 20:17

1 Answers1

0

The answer would depend on how you define your sampling methodology.

If you are sampling 100 customer ids from the dataframe column, then you can simply do. Note if you don't need repetition to occur while sampling you can put replace = FALSE

sampled_obs = sample(x = unique(df$cust_id), size = 100, replace = TRUE)

If you are sampling 100 numbers from total number of rows in your dataframe, then you can simply do:

sampled_obs = sample(x = 1:nrow(df), size = 100, replace = TRUE)

If you are sampling 100 numbers, but the clause is that a number has to be sampled from a set from 1 to current row number of a dataframe (which I believe is the case with your example). Then you can do the following:

samples = vector('numeric', length = nrow(df))
for(i in 1:nrow(df){
    samples[i] = sample(x = 1:i, size = 1)
}
monte
  • 1,482
  • 1
  • 10
  • 26
  • Total Number of rows should be equal to Unique Cust_id in my data set. – VA25 Jul 18 '20 at 06:45
  • is that the sample size? – monte Jul 18 '20 at 10:02
  • Currently this is being used 100 is the total number of samples which we need to create. 100 is not the number of rows. for (i in unique(data$cust_id)){ k=sample(1:length(data$cust_id[data$cust_id==i]),1) – VA25 Jul 18 '20 at 14:16