2

I currently am trying to execute the anesrake function (part of the anesrake package https://cran.r-project.org/web/packages/anesrake/index.html which weights population attribute sets based on sample attribute sets) within R to approximate weight rankings for multiple sets of variables.

I have a table of sample data testData:

Index   GENDER   AGE    
1       Female   18-24  
2       Female   35-64  
3       Male     65+    

Note: age range has 6 levels- 18-24,25-34,35-44,45-54,55-64,65+

I then have a set of 2 lists for my population data:

GENDER <- c(.49,.51)
AGE <- c(.08,.1,.12,.2,.2,.3)

I then create a set of target variables and a CASEID column on the original table:

targets <- list(GENDER, AGE)
names(targets) <- c("GENDER", "AGE")
testData$CASEID <- 1:length(testData$GENDER)

I finally get to see the variance in my population data vs my sample data:

> anesrakefinder(targets, testData, choosemethod = "total")
   GENDER       AGE 
0.1495337 0.3668394 

But when I use the anesrake function to do the final analysis, I get thrown errors:

> anesrake(inputter=targets,dataframe=testData,caseid=testData$CASEID)
Error in rakeonvar.default(mat[, i], inputter[[i]], weightvec) : 
  number of variable levels does not match number of weighting levels
In addition: Warning message:
In rakeonvar.default(mat[, i], inputter[[i]], weightvec) :
  NAs introduced by coercion

I've been following two 'tutorials' on how to utilize anesrake but I'm still coming up short. These are the tutorials below:

http://sdaza.com/survey/2012/08/25/raking/

http://surveyinsights.org/wp-content/uploads/2014/07/Full-anesrake-paper.pdf

Any help that you could provide on this would be greatly, greatly appreciated.

Cheers,

Stu

Stu Richards
  • 141
  • 1
  • 11

2 Answers2

3

You need to label the levels of the target variables the same as the levels of the data variables using the following example-

names(targets$agecat1) <- levels(rak2$agecat1)
names(targets$newpayer) <- levels(rak2$newpayer)
Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
1

I just solved the same issue by transforming my data from character to factor.

You can try the following:

testData$GENDER <- as.factor(testData$GENDER) 
testData$AGE <- as.factor(testData$AGE)
Daniel MB
  • 13
  • 4