Without some justification for why you want to run the test, and perhaps an explanation for why you think it will differentiate certain sites, about 100 will come out as non-normal simply by chance. If you want to check if water quality data is normal in general then it's best to check all of the data at once. The means will vary from site to site so what you can check is the residuals of a linear model with the factor Sitecode
as a predictor.
library(nortest)
dat <- read.csv( 'myDataFileName.csv' )
m <- lm( Mean_res ~ Sitecode, data = dat)
res <- resid(m)
ad.test(res)
Now, you can do your Anderson Darling test on res
.
But just for fun, try generating a few AD tests of your many many samples from a known normal distribution, and look at the qqnorm
plots to see what they look like.
y <- rnorm( nrow(dat) )
ad.test(y)
qqnorm(y); qqline(y)
What you'll find with so many points is that you'll still fail the AD test once in a while but the data still looks quite surprisingly normal. So the answer is probably not an AD test. It is probably best to just look at a plot of the residuals and assess normality there.
Going back to my first comment, the normality test only tells you if you can detect a deviation from normality. It's also, just as with t-tests, extremely sensitive at very high N's and gives false alarms at an alpha rate. It does not tell you if data are normal. So, "passing" the tests will not get you a demonstration that data are normal. Given that they are tests against normality what they'll do is show you what sites are not normal (with many false alarms). Without some reason for believing some of the sites aren't normal your planned tests are probably not what you want to be doing.