1

I'm having problems looping through variables using the survey package. Let's say I have a subset of variables I collect into a dataframe together with the survey weight and I want to carry out chi-square tests. Bearing in mind the problems with multiple testing, I would still like to test all unique combinations. This is normally relatively straightforward in R, and there's a good example here.

Unfortunately this become harder in the survey package because items need to be in the design object, and most importantly dataset indexing is not supported (at least as far as I know). I've tried adapting the example mentioned above to svychisq, but all my strategies have failed.

I've noticed that someone has done something similar here, but most of the variables are fixed. Would anyone be able to create a function (something similar to this answer maybe) but using the svychisq function? Unfortunately I don't know of datasets with lots of categorical variables and complex design available online. For the purposes of demonstration I suppose one could use dclus1 in data(api) as shown in the function help file and attempt to loop through the first 10 variables

library(survey)
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svychisq(~sch.wide+stype, dclus1)

Any help would be greatly appreciated.

UPDATE: What I'm really trying to do is avoiding specifying the variable names and give a vector of variables combinations instead. e.g.

MyChi2tests <- apply( combn(colnames(apiclus1[,c(2,16:17)]),2), 2, function(z) paste(z, collapse = '+')) 
Community
  • 1
  • 1
maycobra
  • 417
  • 7
  • 15

1 Answers1

5
library(survey)
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

# run a simple example svychisq() function
svychisq( ~sch.wide+stype , dclus1 )

# create a function that requires a character string (containing the variables)
# and the design object and runs the svychisq() 
# on the contents of that character string's columns
scsloop <- function( vars , design ){ svychisq( as.formula( paste0( "~" , vars ) ) , design ) }

# test it out
scsloop( "sch.wide+stype" , dclus1 )
scsloop( "sch.wide+comp.imp" , dclus1 )

# either create a character vector to run it multiple times
cols.to.chisq <- c( "sch.wide" , "comp.imp" , "stype" )

# or identify them based on column number, if you prefer
cols.to.chisq <- names( apiclus1 )[ c( 2 , 16 , 17 ) ]


# find every combination of that vector, taken two at a time
combos <- combn( cols.to.chisq , 2 )

# separate them by +
col.combos <- paste( combos[ 1 , ] , combos[ 2 , ] , sep = "+" )

# run it on each of those character strings (print to screen and save to list)
( x <- lapply( col.combos , scsloop , dclus1 ) )

# just for kicks, print everything to the screen
col.combos[1] ; x[[1]]
col.combos[2] ; x[[2]]
col.combos[3] ; x[[3]]
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • Thank you! Just one thing. My aim is to put all the unique variables combinations in one vector (otherwise listing the pairs is pretty much like doing it with the original function). Let's take an example with just three unique combinations:variables 2,16 & 17. When I try to create cols.to.chisq automatically like this: cols.to.chisq <- apply( combn(colnames(apiclus1[,c(2,16:17)]),2), 2, function(z) paste(z, collapse = '+')) and then run it in lapply as above I get an error. Error in as.integer(.margins) : cannot coerce type 'closure' to vector of type 'integer' – maycobra Nov 16 '12 at 09:35
  • # i've edited my answer.. i think that's what you need? but when i run your code.. cols.to.chisq <- apply( combn(colnames(apiclus1[,c(2,16:17)]),2), 2, function(z) paste(z, collapse = '+')) # and then lapply using my scsloop function lapply( cols.to.chisq , scsloop , dclus1 ) # it works fine for me.. :) – Anthony Damico Nov 16 '12 at 12:31
  • I love the above answer, its helped me immensely with a data set that have over 2000 variables. can i just ask is there a way to incorporate subset data sets. if i want to run the above function across all my variables test male respondents vs female respondents? – DataDancer Jan 29 '16 at 03:57
  • @DataDancer maybe `?svyby` ... otherwise ask a separate stackoverflow question with the [r] and [survey] tags thanks – Anthony Damico Jan 29 '16 at 12:16