I'm trying out discrepancy analysis. Due to the large size of my sequence data I'm using the weights with the WeightedCluster package. Everything works smoothly until the point when I get to the actual dissassoc()
part. I don't seem to be able to find my group variables.
I've tried closely following the examples from the WeightedCluster manual and Studer et al.'s article from 2011. This post is useful and has helped me forward How to use discrepancy analysis with TraMineR and aggregated sequence data?, but I cannot figure out how to get from there to finding those separate group variables in the dissassoc()
argument. Let's say I'm using the same example data (although my original data doesn't have sampling weights), but I can only use aggregate data:
## Aggregate example data
mvad.agg <- wcAggregateCases(mvad[, c(10:12, 17:86)], weights=mvad$weight)
mvad.agg
## Define sequence object
mvad.agg.seq <- seqdef(mvad[mvad.agg$aggIndex, 17:86], alphabet=mvad.alphabet,
states=mvad.scodes, labels=mvad.labels,
weights=mvad.agg$aggWeights)
## Computing OM dissimilarities
mvad.agg.dist <- seqdist(mvad.agg.seq, method="OM", indel=1.5, sm="CONSTANT")
## Discrepancy analysis
dissassoc (mvad.agg.dist, group = mvad$gcse5eq, weights = mvad.agg$aggWeights, weight.permutation = "replicate")
So in the last step, I cannot figure out how to link to the group variable. I've tried using different options to define the group (e.g., mvad.agg$gcse5eq
, mvad$gcse5eq
) and many variations of disaggregating/aggregating and weighting/unweighting the data, but I either get "Object gcse5eq not found" or "Error in diss[!is.na(group), !is.na(group)] : incorrect number of dimensions"
I'm new to SO, so hopefully my example is clear and useful. I hope someone can help!