I have a dataset (STATPOP2016 by Swiss Federal Statistical Office) that contains number of households of different sizes per each hectar of Swiss territory. In other terms, for each hectar i
I have:
x1
households consisting of one individual
x2
households consisting of two individuals
...
x6
households with 6 or more individuals (I consider them as having 6 people for simplicity).
I need to create a variable that will show me interquartile range for the households' number per each hectar. I have the code that works, but it is very slow. Is there a smarter way to do the same thing?
There is my code:
# Vector that contains all possible sizes of households
vector_hh_size <- c(1:6)
# Variable for interquantile range in household size. A is my dataframe
A$hh_size_IQR <- 0
# Vector that contains frequency of each size of household in a given hectar
vector_hh_frequency <- c(0,0,0,0,0,0)
for (i in 1:NROW(A)) {
for (j in 1:6){
vector_hh_frequency[j] <- eval(parse(text = paste("A$hh",j,"[",i,"]",sep = "")))
}
A$hh_size_IQR[i] <- wtd.quantile(vector_hh_size, weights = vector_hh_frequency)[4] - wtd.quantile(vector_hh_size, weights = vector_hh_frequency)[2]
}
Here is example of data:
hh1 hh2 hh3 hh4 hh5 hh6 IQR
1 0 3 0 0 0 0 0
2 0 3 0 0 0 0 0
3 0 0 3 0 0 0 0
4 0 3 0 0 0 0 0
5 3 6 3 3 0 0 1
6 0 3 0 0 3 0 3
7 11 7 4 7 3 0 3
8 3 3 0 3 0 0 3
9 3 3 0 3 0 0 3
10 0 3 0 0 0 0 0
#OBS
is observation number, hhi
shows how many households with i
people there are. IQR
is interquartile range for each observation - this is the variable I am building.