1

I'm trying to produce scatterplots with regression equation and r2 for grouped data.

I can do one, but with grouped data I'm finding trouble when calculating the equations and r2 for all groups in a way that can be automatically extracted and added as annotation.
I believe that I'm pretty close, just making some silly mistake but can't seem to identify it.

1 - First I create a function that creates a model and the string of characters with the results.

library(dplyr)    
eqlabels <- function(iris){
  m <- lm(Sepal.Length ~ Sepal.Width, iris);
  eq <- substitute(italic(y) == a + b * italic(x) * "," ~~ italic(r) ^ 2 ~ "=" ~ r2, 
                   list(a = format(coef(m)[1], digits = 3),
                        b = format(coef(m)[2], digits = 3),
                        r2 = format(summary(m)$r.squared, digits = 2)))
  as.character(as.expression(eq));
}

I came as far as this, but on step 2 it all breaks down:

2 - Now I must use the function on the grouped data.

This post suggests the use of ddply (from plyr package). I tried to replace that with something equivalent from the dplyr package, as suggested here.

 labelsP3 <- iris %>% group_by(Species) %>% do(eqlabels(.))

However, this results in warning message (and then it does not plot...): Warning message:

Error: Results are not data frames at positions: 1, 2, 3

As suggested here, I tried:

labelsP3 <- iris %>% group_by(Species) %>% do(with(eqlabels(iris)))

But this results in error:

Error in eval(substitute(expr), data, enclos = parent.frame()) : invalid 'envir' argument of type 'character'

The plotting should be fine like this, but I'm stuck at this stage.

plot3 <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(colour = "grey60") +
  facet_grid(Species ~ .) +
  stat_smooth(method = lm) +
  annotate("text", label = labelsP3, parse = TRUE)

Thank you.

Community
  • 1
  • 1
jpinelo
  • 1,414
  • 5
  • 16
  • 28

1 Answers1

1

Alrighty let's try this again:

do the following:labelsP3<-ddply(iris,.(Species),eqlabels) that will get you your equations:

    Species                                                                           
    1     setosa  italic(y) == "2.64" + "0.69" * italic(x) * "," 
~italic(r)^2 ~ "=" ~ "0.55"
    2 versicolor italic(y) == "3.54" + "0.865" * italic(x) * "," ~ 
~italic(r)^2 ~ "=" ~ "0.28"
    3  virginica italic(y) == "3.91" + "0.902" * italic(x) * "," ~ 
~italic(r)^2 ~ "=" ~ "0.21"

Now that you have the equations, you should easily be able to plot them on your graph

you can then use this to graph the equations on your plot

geom_text(data=labels3, aes(label=V1, x=7, y=2), parse=TRUE)

EDIT: THIRD TIME IS A CHARM

So after a lots of trial and error I got it to work, I still get a warning but at least it's a step in the right direction. As I suspected earlier, you have to use as.data.frame, like so: labelsP3 <- iris %>% group_by(Species) %>% do(as.data.frame(eqlabels(.)))

you get the following output:

     Source: local data frame [3 x 2]
        Groups: Species [3]

             Species                                                                   eqlabels(.)
              (fctr)                                                                         (chr)
        1     setosa  italic(y) == "2.64" + "0.69" * italic(x) * "," ~ 
~italic(r)^2 ~ "=" ~ "0.55"
        2 versicolor italic(y) == "3.54" + "0.865" * italic(x) * "," ~ 
~italic(r)^2 ~ "=" ~ "0.28"
        3  virginica italic(y) 

    == "3.91" + "0.902" * italic(x) * "," ~ ~italic(r)^2 ~ "=" ~ "0.21"

Does that help you??

UPDATE:

For the plotting part you can do it as follow:

    plot3 <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(colour = "grey60") +
      facet_grid(Species ~ .) +
      stat_smooth(method = lm) + 
      geom_text(data=labelsP3, aes(label=`eqlabels(.)`, x=7, y=2), parse=TRUE)

the x and y is geom_text is for the placement of the label on the graph.

or this even looks a bit better:

 plot3 + geom_text(data=labelsP3, aes(label=`eqlabels(.)`, vjust = -1, +
hjust=-0.5,x=4, y=0), parse=TRUE) 

Plot of the command above

Tekill
  • 1,171
  • 1
  • 14
  • 30
  • Thanks @Yourinium. Not that doesn't work either. (even when adding a fourth parenthesis at the end of your expression as I believe it's missing, as well as the "." in as.data.frame()). Get error: Error in eval(expr, envir, enclos) : argument is missing, with no default – jpinelo Nov 28 '15 at 14:37
  • RIght now if I execute eqlabels(iris), i get the following output:[1] "italic(y) == \"6.53\" + \"-0.223\" * italic(x) * \",\" ~ ~italic(r)^2 ~ \"=\" ~ \"0.014\"", Is that what you expect? I want to make sure I am tracking with you – Tekill Nov 28 '15 at 15:03
  • I can get the right output from the function `eqlabel()` for one group, like you are doing here. The difficulty is on the next step: to use the function with dplyr with group_by() – jpinelo Nov 28 '15 at 15:07
  • Thanks @Yourinium. I know how to make it work with ddply, but since I don't want to use the discontinued plyr package, I was precisely trying to replace ddply with some function(s) from the dplyr package. This is where the problem starts. – jpinelo Nov 28 '15 at 16:05
  • Well, now that I really understood what you were asking, check out my edited response. Hopefully, it moves you in the right direction – Tekill Nov 28 '15 at 18:25
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/96435/discussion-between-yourinium-and-jpinelo). – Tekill Nov 28 '15 at 18:59