I need to assign weights on a sample for a country. I already have the population by region 85 (regions) but I cannot perform the cluster sampling. Basically, I need to create 100 clusters each with 15 units. Overall 1500 respondents. I have an excel file with all the variables for the 85 regions.
Question 1:
How can I use the already generated population probability to do a weighted randomization for 100 clusters (with 15 units each)?
Question 2:
I need to draw from the 85 regions and generate 100 clusters. Logically, the capital and some of the other big cities should have more than 1 clusters due to higher population which gives them higher probability of having a cluster. Thus, How can I draw the clusters (15 units each) and assign a number of clusters to the different regions? For instance, the cluster probability is 0.08 percent and this will mean that I need 8 clusters of the 100 (15 units each) to be assigned to the capital. How do I add that column?
Specifically the problem with my current results is that I cannot generate the column with the number of clusters per region. For instance, region A to have 3 clusters, while region B 1 and so forth.
Here is my code:
data1$clusProb1 = (data1$Population.2018)/sum(data1$Population.2018)
sampInd = c(1:length(data1$Federal.Subject),sample(1:length(data1$Federal.Subject), length(data1$Federal.Subject)*14, prob = data1$clusProb, replace = TRUE))
sampFields = data.frame(id = 1:(length(data1$Federal.Subject)*15), Gender = sample(c(0,1), length(data1$Federal.Subject)*15, replace=TRUE), replace=TRUE))
sampleData = cbind(data1[sampInd,],sampFields)
sampleData
summary(sampleData)
The result should look like:
Cluster number Region
1 A
2 A
3 A
4 C
5 D
6
NOTE: A representing the regions with higher population which should have more clusters assigned to them.