3

I have a dataset which was stratified across 3 different populations and resulted in the following sampling pattern:

               A        B       C      All
Pop Size     713     2904    4687     8305
Num Sampled   72      135     159

In order to make any statistic representative of the entire distribution I created a weight for each sample population (A,B,C).

To do this I computed the fraction of each population that was sampled, divided this by the fraction of the entire population that was sampled, and then normalized these.

Weight      0.16     0.35    0.48

I then added a column to my data set as follows:

wt <- c(0.16, 0.35, 0.48)
MyData$Weight <- wt[MyData$PopGroup]

I can then use this Weight column with the wtd.hist or wtd.mean functions or using the weight aesthetic in ggplot.

What I can not figure out is how to perform statistical tests on the weighted data. Specifically, neither the shapiro.test nor prop.test functions support a weight parameter.

Techniquab
  • 843
  • 7
  • 22
  • 1
    Generally requests to determine "normality" are not needed and based on faulty understanding of statistical theory. This test should be equally valid for weighted data situations: `install.packages('TeachingDemos'); TeachingDemos::SnowsPenultimateNormalityTest(dat_vector)` – IRTFM Jul 11 '15 at 17:00
  • Unfortunately the 'TeachingDemos' package is not available for R version 3.2.0. – Techniquab Jul 12 '15 at 08:33
  • As far as being interested in the "normality" it is my understanding that running a real valid or robust linear regression or ancova depends on my data being normally distributed and having equal variance. Also, I'm more interested in being able to compute confidence limits on my weighted means (hence the **prop.test**). -Thanks – Techniquab Jul 12 '15 at 08:39
  • That understanding, at least the data being normal part, is incorrect. Equality of variance is somewhat more important. If your were planning on using prop.test then normaility cannot be important since you would be guaranteed that the the data is not, because it would be discrete, and normality is only meaningful when talking about continuous measures. – IRTFM Jul 12 '15 at 14:54
  • Okay...so I don't need normality for the regression, just equality of variance and can use wtd.var to evaluate that. I'm still looking, for other parts of the analysis to be able to test the equality of two means either of a binary variable or a continuous variable. The only way I've been able to do it is to use ggplot and the the stat_summary add a stat summary with a mean_cl_boot but this doesn't seem like best approach and I can't find a way to grammatically retrieve the results. – Techniquab Jul 12 '15 at 15:20
  • For anyone struggling with this same problem...my current work around is to use weight aes in ggplot and the stat_summary function to compute the means and confidence limits and then use this answer:[link](http://stackoverflow.com/questions/9789871/method-to-extract-stat-smooth-line-fit) to get the results out of the plot. – Techniquab Jul 12 '15 at 15:28

0 Answers0