How to compare samples against some baseline data?

Question

I have pollution index (PI) data collected from 5 sites (BS, PL, GB, EM and CB). Now I want to run an ANOVA test in order to compare sites by using EM as the reference site.

How can I do this using r?

Here's some of my data:

Site    PI
BS  0.229056825
BS  0.217159476
BS  0.229508197
BS  0.127530364
BS  0.112942122
PL  0.198957428
PL  0.196986402
PL  0.126753976
PL  0.092816275
PL  0.143828265
GB  0.26883746
GB  0.156452227
GB  0.206349206
GB  0.214451538
GB  0.216014235
EM  0.192636423
EM  0.164269912
EM  0.181659896
EM  0.161620295
EM  0.145764576
CB  0.38490566
CB  0.111111111
CB  0.092592593
CB  0.40625
CB  0.68852459

See if this helps: http://stackoverflow.com/questions/3872070/how-to-force-r-to-use-a-specified-factor-level-as-reference-in-a-regression — LJW, May 05 '15 at 03:00
Can you also please say what you've tried and where you are getting stuck? — LJW, May 05 '15 at 03:04
Thank you very much for your answer. Actually I want to get significance level (p - value) for each of 4 sites (BS,PL,GB and CB) compare to the EM (reference) site. — Nuwan Silva, May 06 '15 at 11:25

Jaehyeon Kim · Accepted Answer · 2015-05-05T04:15:46.000

ANOVA by linear regression is an easy way and anova() in the stats package accepts a lm object.

Assuming that your data is in a data frame called as df, the following shows a way.

It loops over sites except for the reference site (EM) and do the ANOVA test, followed by fitting linear regression. A list of sites and anova results is returned by the loop (comparison).

In the console output, the first element is the site that's compared (BS) and the second element is the ANOVA result.

# reference PI
ref <- df[df$Site=="EM", 2]

# site names except for EM, used to filter df
sites <- unique(df$Site)[unique(df$Site) != "EM"]

comparison <- lapply(sites, function(x) {
   # ANOVA for comparion
   fit <- lm(ref ~ df[df$Site==x, 2])
   list(x, anova(fit))
})
comparison[[1]]

#[[1]]
#[1] BS
#Levels: BS CB EM GB PL

#[[2]]
#Analysis of Variance Table

#Response: ref
#Df     Sum Sq    Mean Sq F value  Pr(>F)  
#df[df$Site == x, 2]  1 0.00093945 0.00093945  7.1162 0.07584 .
#Residuals            3 0.00039605 0.00013202                  
#---
#    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

How to compare samples against some baseline data?

1 Answers1