I have some survey data. As an example, I use the credit
data from the ÌSLR
package.
library(ISLR)
The distribution of Gender in the data looks like this
prop.table(table(Credit$Gender))
Male Female
0.4825 0.5175
and the distribution of Student looks like this.
prop.table(table(Credit$Student))
No Yes
0.9 0.1
Let´s say, in the population, the actual distribution of Gender is Male/Female(0.35/0.65) and the distribution of Student is Yes/No(0.2/0.8).
In SPSS it´s possible to weight the samples, by dividing the "population distribution" by the "distribution of the sample" to simulated the distribution of the population. This process is called "RIM Weighting". The data will be only analyzed by crosstables (i.e. no regression, t-test, etc.). What is a good method in R the weight a sample, in order to analyze the data by crosstables later on?
It is possible to calculate the RIM weights in R.
install.packages("devtools")
devtools::install_github("ttrodrigz/iterake")
credit_uni = universe(df = Credit,
category(
name = "Gender",
buckets = c(" Male", "Female"),
targets = c(.35, .65)),
category(
name = "Student",
buckets = c("Yes", "No"),
targets = c(.2, .8)))
credit_weighted = iterake(Credit, credit_uni)
-- iterake summary -------------------------------------------------------------
Convergence: Success
Iterations: 5
Unweighted N: 400.00
Effective N: 339.58
Weighted N: 400.00
Efficiency: 84.9%
Loss: 0.178
Here the SPSS output (crosstables) of the weighted data
Student
No Yes
Gender Male 117 23 140
Female 203 57 260
320 80 400
and here from the unweighted data (I export both files and made the calculation in SPSS. I weighted the weighted sample by the calculated weights).
Student
No Yes
Gender Male 177 16 193
Female 183 24 20
360 40 400
In the weighted data set, I have the desired distribution Student: Yes/No(0.2/0.8) and Gender male/female(0.35/0.65).
Here is another example using SPSS of Gender and Married (weighted)
Married
No Yes
Gender Male 57 83 140
Female 102 158 260
159 241 400
and unweighted.
Married
No Yes
Gender Male 76 117 193
Female 79 128 207
155 245 400
This doesn't work in R (i.e. both crosstables looks like the unweighted one).
library(expss)
cro(Credit$Gender, Credit$Married)
cro(credit_weighted$Gender, credit_weighted$Married)
| | | Credit$Married | |
| | | No | Yes |
| ------------- | ------------ | -------------- | --- |
| Credit$Gender | Male | 76 | 117 |
| | Female | 79 | 128 |
| | #Total cases | 155 | 245 |
| | | credit_weighted$Married | |
| | | No | Yes |
| ---------------------- | ------------ | ----------------------- | --- |
| credit_weighted$Gender | Male | 76 | 117 |
| | Female | 79 | 128 |
| | #Total cases | 155 | 245 |