I've been handed a massive dataset and feel hopelessly lost after 12 hours of work.
The data:
- Shell measurements from ~3400 lab snails raised at three different temperatures. Sourced from 3 cities, and 2 habitats within those cities.
- Dependent variable = continuous; snail shell size in cm (n=~3400)
Independent variables = city (Key Largo, Miami, Jacksonville), habitat (marsh or beach), and rearing temp (26, 28, or 30 degrees C).
- Random effect = snail colony (n=~200)
Hypotheses:
- Shell sizes are larger in beaches than marshes
- Shell sizes decrease with increasing latitude
What I need to do in ggplot2:
Plot the 3 temps on the x axis, and shell size on the y axis
Plot the means of each group with error bars
Make a line connecting the group means
- Have two or three lines on the same graph
What I need to do statistically:
- Run a test that will give me a p-value for the above 2 hypotheses and other potential relationships
What I've tried to get means and SE:
I just get a "+" sign when using this code:
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE, conf.interval=.95, .drop=TRUE) { require(plyr) #New version of length which can handle NA's: if na.rm==T, don't count them length2 <- function (x, na.rm=FALSE) { if (na.rm) sum(!is.na(x)) else length(x) } # This is does the summary; it's not easy to understand... datac <- ddply(data, groupvars, .drop=.drop, .fun= function(xx, col, na.rm) { c( N = length2(xx[,col], na.rm=na.rm), mean = mean (xx[,col], na.rm=na.rm), sd = sd (xx[,col], na.rm=na.rm) ) }, measurevar, na.rm )
What I've tried graphically:
Used concatenate to make a vector of the means, and tried to graph using:
qplot(temp,wl,data=cleant, facets=.~habitat, geom=c("point","smooth"), method="lm")
But I can't connect the means, add SE, or put a best fit line through
Analyses I've tried:
I made a mixed-effects model:
model <- lme(shell.size ~habitat * temp *city, random = ~1|colony, data = FL.snails)
And ran an anova:
anova(model, type = 'marginal')
But I have no idea if that tests my hypotheses.
There are so many relationships and hierarchies it's making my head spin. With 2 habitats, 3 cities, and 3 temps, I'm overwhelmed with all the possible combinations of things to test for.
Any help with any of the above would be infinitely appreciated. I am really having a hard time.
Here is the abbreviated dput
28L, 28L,....., 28L, 26L,...... 26L, 26L, 30L,...... 30L, ..........0.683, 1.283)), .Names = c("colony", "individual", "city", "habitat", "temp", "shell.size"), class = "data.frame", row.names = c(NA, -5471L))