Creating clusters via weighted randomization

Question

I need to assign weights on a sample for a country. I already have the population by region 85 (regions) but I cannot perform the cluster sampling. Basically, I need to create 100 clusters each with 15 units. Overall 1500 respondents. I have an excel file with all the variables for the 85 regions.

Question 1:

How can I use the already generated population probability to do a weighted randomization for 100 clusters (with 15 units each)?

Question 2:

I need to draw from the 85 regions and generate 100 clusters. Logically, the capital and some of the other big cities should have more than 1 clusters due to higher population which gives them higher probability of having a cluster. Thus, How can I draw the clusters (15 units each) and assign a number of clusters to the different regions? For instance, the cluster probability is 0.08 percent and this will mean that I need 8 clusters of the 100 (15 units each) to be assigned to the capital. How do I add that column?

Specifically the problem with my current results is that I cannot generate the column with the number of clusters per region. For instance, region A to have 3 clusters, while region B 1 and so forth.

Here is my code:

data1$clusProb1 = (data1$Population.2018)/sum(data1$Population.2018)

sampInd = c(1:length(data1$Federal.Subject),sample(1:length(data1$Federal.Subject), length(data1$Federal.Subject)*14, prob = data1$clusProb, replace = TRUE))

sampFields = data.frame(id = 1:(length(data1$Federal.Subject)*15), Gender = sample(c(0,1), length(data1$Federal.Subject)*15, replace=TRUE), replace=TRUE))

sampleData = cbind(data1[sampInd,],sampFields)
sampleData

summary(sampleData)

The result should look like:

Cluster number  Region      
1   A 
2   A
3   A
4   C              
5   D
6

NOTE: A representing the regions with higher population which should have more clusters assigned to them.

When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Have you tried anything? Where exactly are you getting stuck? — MrFlick, Mar 12 '18 at 14:20
Sorry about that I was trying to format the code in the question tab but could do it fast enough. The upper comment is the code I have so far. — Big Z, Mar 12 '18 at 14:30
Edit your question to include the sample so it can be properly formatted. Then describe exactly what's wrong with what you have so far. — MrFlick, Mar 12 '18 at 14:34
Sorry about that @MrFlick I was trying to format the code in the question tab but could not do it fast enough. The upper comment is the code I have so far. — Big Z, Mar 12 '18 at 14:38

Creating clusters via weighted randomization

Question 1:

Question 2:

0 Answers0