-1

I've been handed a massive dataset and feel hopelessly lost after 12 hours of work.

The data:

  • Shell measurements from ~3400 lab snails raised at three different temperatures. Sourced from 3 cities, and 2 habitats within those cities.
  • Dependent variable = continuous; snail shell size in cm (n=~3400)
  • Independent variables = city (Key Largo, Miami, Jacksonville), habitat (marsh or beach), and rearing temp (26, 28, or 30 degrees C).

    • Random effect = snail colony (n=~200)

Hypotheses:

  • Shell sizes are larger in beaches than marshes
  • Shell sizes decrease with increasing latitude

What I need to do in ggplot2:

  • Plot the 3 temps on the x axis, and shell size on the y axis

  • Plot the means of each group with error bars

  • Make a line connecting the group means

  • Have two or three lines on the same graph

What I need to do statistically:

  • Run a test that will give me a p-value for the above 2 hypotheses and other potential relationships

What I've tried to get means and SE:

  • I just get a "+" sign when using this code:

    summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE, 
    conf.interval=.95, .drop=TRUE) {
    require(plyr) #New version of length which can handle NA's: if na.rm==T, 
    don't count them
    length2 <- function (x, na.rm=FALSE) {
    if (na.rm) sum(!is.na(x))
    else       length(x)
    }
    
    # This is does the summary; it's not easy to understand...
    datac <- ddply(data, groupvars, .drop=.drop,
               .fun= function(xx, col, na.rm) {
                       c( N    = length2(xx[,col], na.rm=na.rm),
                          mean = mean   (xx[,col], na.rm=na.rm),
                          sd   = sd     (xx[,col], na.rm=na.rm)
                          )
                      },
                measurevar,
                na.rm
         )
    

What I've tried graphically:

  • Used concatenate to make a vector of the means, and tried to graph using:

    qplot(temp,wl,data=cleant, facets=.~habitat, geom=c("point","smooth"), method="lm")
    

But I can't connect the means, add SE, or put a best fit line through

Analyses I've tried:

I made a mixed-effects model:

    model <- lme(shell.size ~habitat * temp *city, random = ~1|colony, data = FL.snails)

And ran an anova:

    anova(model, type = 'marginal')

But I have no idea if that tests my hypotheses.

There are so many relationships and hierarchies it's making my head spin. With 2 habitats, 3 cities, and 3 temps, I'm overwhelmed with all the possible combinations of things to test for.

Any help with any of the above would be infinitely appreciated. I am really having a hard time.

Here is the abbreviated dput

    28L, 28L,....., 28L, 26L,...... 26L, 26L, 30L,...... 30L, ..........0.683, 1.283)), .Names = c("colony", "individual", "city", "habitat", "temp", "shell.size"), class = "data.frame", row.names = c(NA, -5471L))
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • Stack Overflow is for specific programming questions, it's not a discussion forum for general advice. Try to edit your post to ask one specific question (choose the modeling or the plotting -- post the other as a separate question). Additionally, when asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions (an "abbreviated" dput is not really helpful). – MrFlick Jun 22 '18 at 18:40
  • I'm trying to help you improve your question so you can get an answer. Edit to focus on one specific problem. We can help you, but we won't just do some big homework-like assignment for you. Don't get discouraged, just try to edit your post. – MrFlick Jun 22 '18 at 19:03
  • Sorry for your frustration. It looks like you have a rich, interesting data set. It may just take more time to doing exploratory plots and models. In my job, I might spend 2 weeks working up that type of data set. Also, I hope you can be patient with yourself learning ggplot2. It took me many months to become fluent, but it has been worth the time. – bdemarest Jun 22 '18 at 23:51
  • Thank you both for your suggestions and encouragement. I was getting very frustrated and impatient with R, but I have since spent a considerable amount of time working in R and have been able to do what I had set out to do. – Richard Gourderton Jul 04 '18 at 17:09

1 Answers1

0

Wait, did you try to run summary() on your model?

tidy() from the broom package might come in handy as well if you are trying to pull coefficients into a single tibble.

Maybe this will help you? https://www.r-bloggers.com/linear-models-anova-glms-and-mixed-effects-models-in-r/

BBlank
  • 58
  • 5
  • Thanks for your help. I used summary(), and it gave me many values, but didn't take the different temperature treatments into account. I'm going to read that link; it looks like good info. Thank you BBlank – Richard Gourderton Jun 22 '18 at 20:29
  • I think you're on to something. I changed my temperatures to factors (apparently they weren't factors when I called "is.factor"), and when I hit summary(), I get all kinds of p values and relationships. You made me realize the temperatures weren't factors this whole time. I think that was giving me problems. – Richard Gourderton Jun 22 '18 at 21:16
  • @RichardGourderton hope you understand the difference between "factor" and "continuous". I wonder if its legitimate to code `Temperature` as a factor? Conceptually, factors are variables which take on a limited number of different values; such variables are often refered to as categorical variables. – mnm Jun 23 '18 at 07:07