0

I have a dataset (STATPOP2016 by Swiss Federal Statistical Office) that contains number of households of different sizes per each hectar of Swiss territory. In other terms, for each hectar i I have:

x1 households consisting of one individual

x2 households consisting of two individuals

...

x6 households with 6 or more individuals (I consider them as having 6 people for simplicity).

I need to create a variable that will show me interquartile range for the households' number per each hectar. I have the code that works, but it is very slow. Is there a smarter way to do the same thing?

There is my code:

# Vector that contains all possible sizes of households    
vector_hh_size <- c(1:6)

# Variable for interquantile range in household size. A is my dataframe
A$hh_size_IQR <- 0 

# Vector that contains frequency of each size of household in a given hectar
vector_hh_frequency <- c(0,0,0,0,0,0)

for (i in 1:NROW(A)) {
  for (j in 1:6){
    vector_hh_frequency[j] <- eval(parse(text = paste("A$hh",j,"[",i,"]",sep = "")))
  }

  A$hh_size_IQR[i] <- wtd.quantile(vector_hh_size, weights = vector_hh_frequency)[4] - wtd.quantile(vector_hh_size, weights = vector_hh_frequency)[2]
}

Here is example of data:

   hh1 hh2 hh3 hh4 hh5 hh6         IQR
1    0   3   0   0   0   0           0
2    0   3   0   0   0   0           0
3    0   0   3   0   0   0           0
4    0   3   0   0   0   0           0
5    3   6   3   3   0   0           1
6    0   3   0   0   3   0           3
7   11   7   4   7   3   0           3
8    3   3   0   3   0   0           3
9    3   3   0   3   0   0           3
10   0   3   0   0   0   0           0

#OBSis observation number, hhi shows how many households with i people there are. IQR is interquartile range for each observation - this is the variable I am building.

slightlym
  • 25
  • 2
  • 7

1 Answers1

0

Here is a shorter version of your code:

library("Hmisc")

A <- read.table(header=TRUE, text=
"    hh1 hh2 hh3 hh4 hh5 hh6
  1    0   3   0   0   0   0 
  2    0   3   0   0   0   0
  3    0   0   3   0   0   0
  4    0   3   0   0   0   0
  5    3   6   3   3   0   0
  6    0   3   0   0   3   0
  7   11   7   4   7   3   0
  8    3   3   0   3   0   0
  9    3   3   0   3   0   0
  10   0   3   0   0   0   0")

vector_hh_size <- 1:ncol(A)

myIQR <- function(Ai) wtd.quantile(vector_hh_size, weights=Ai)[4] - wtd.quantile(vector_hh_size, weights=Ai)[2]
A$IQR <- apply(A, 1, myIQR)
# > A
#    hh1 hh2 hh3 hh4 hh5 hh6 IQR
# 1    0   3   0   0   0   0   0
# 2    0   3   0   0   0   0   0
# 3    0   0   3   0   0   0   0
# 4    0   3   0   0   0   0   0
# 5    3   6   3   3   0   0   1
# 6    0   3   0   0   3   0   3
# 7   11   7   4   7   3   0   3
# 8    3   3   0   3   0   0   3
# 9    3   3   0   3   0   0   3
# 10   0   3   0   0   0   0   0
jogo
  • 12,469
  • 11
  • 37
  • 42