Extracting origin and slope values from facet-grided ggplot2 scatter plots [R]

Question

I've been helped by @Ista to produce the follow code. From this code, I now need to save:

Q1: the average 'm' value at nR = 0 for each of the 8 scatter plots;
Q2: the value of the slope of the regression (shown here with stat_smooth) for only a subset of the data (let's say, all data from nR = 0 to nR = 50). As you can see in the example, the regression is computed for the whole dataset;
data should be saved as to be reused for another visualisation with ggplot2. I'm not sure if this point matters, but I'm mentioning it just in case.

require(reshape2)
library(ggplot2)
library(RColorBrewer)

md = read.csv(file="http://dl.dropboxusercontent.com/u/73950/rob-136.csv", sep=",", header=TRUE)

dM = melt(md,c("id"))
# split variable out into its components
dM <- cbind(dM,
        colsplit(dM$variable,
                 pattern = "_",
                 names = c("Nm", "order", "category"))) 
# no longer need variable, as it is represented by the combination of Nm, order, and category
dM$variable <- NULL
# rearrange putting category in the columns
dM <- dcast(dM, ... ~ category, value.var = "value")

# plot
p = ggplot(dM, aes(x=nR ,y=m))
p = p + scale_y_continuous(name="m")+ scale_x_continuous(name="nR") + xlim(0,136)
p = p + facet_grid(order~Nm)+ ggtitle("Title")
p = p + stat_bin2d(bins=50)
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
p = p + scale_fill_gradientn(colours = myPalette(100))
p = p + theme(legend.position="none")
p = p + stat_smooth(method = 'lm')
p

enter image description here

I hope this is clear enough, but please let me know if I'm not making sense...

Cheers!

A few comments. Why do you keep assigning everything to p and overwrite it? Make one statement, where "layers" are separated by a plus sign. Have you considered doing the analysis outside, and then introduce the results (fitted values) via special layers (`geom_line`)? For extracting stat results, see http://stackoverflow.com/questions/9789871/method-to-extract-stat-smooth-line-fit — Roman Luštrik, Feb 11 '14 at 22:28
I use "p = p +" to switch some lines on and off when so needed. I find it practical. I could, indeed, do the computations elsewhere, but I'd like to do it in one shot if possible, from the newly arranged dM dataset. — Lucien S., Feb 11 '14 at 22:32
I may have hastened myself. The solutions proposed by James and Brian do not output model parameters, just predictions and errors. — Roman Luštrik, Feb 11 '14 at 22:37
Thanks for looking into it! I don't absolutely *need* to have it extracted from stat_smooth, I guess getting one of R's linear fitter to do the job would be just as fine. What's important in this case, is that said fitting must be done on each scatter plots, from subsets of Dm. — Lucien S., Feb 11 '14 at 22:43

Roman Luštrik · Accepted Answer · 2014-02-11T23:07:51.570

I would do it like this.

myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))

# plot
ggplot(dM, aes(x = nR, y = m)) +
  scale_y_continuous(name = "m") + 
  scale_x_continuous(name = "nR") + 
  xlim(0, 136) +
  facet_grid(order ~ Nm) + 
  ggtitle("Title") +
  stat_bin2d(bins = 50) +
  scale_fill_gradientn(colours = myPalette(100)) +
  theme(legend.position="none") +
  stat_smooth(method = "lm")

out <- by(data = dM, INDICES = list(dM$order, dM$Nm), FUN = function(x) {
  model <- lm(m ~ nR, data = x)
  data.frame(
    intercept = coef(model)["(Intercept)"], 
    nR = coef(model)["nR"], 
    nM = paste(unique(x$Nm), "/", unique(x$order), sep = "")
    )
})
do.call("rbind", out)

              intercept            nR     nM
(Intercept)  0.07883183  0.0190254723 FS/GED
(Intercept)1 0.11683689  0.0007879976 FS/NAR
(Intercept)2 0.14945769  0.0036112481 N2/GED
(Intercept)3 0.16427558  0.0017356170 N2/NAR
(Intercept)4 0.48951709  0.0025201291 SW/GED
(Intercept)5 0.51569334  0.0011353692 SW/NAR
(Intercept)6 0.65299500  0.0012065830 VE/GED
(Intercept)7 0.64931290 -0.0001557808 VE/NAR

EDIT

Or, you can introduce a second term into your model equation and produce "curvy" curves.

ggplot(dM, aes(x = nR, y = m)) +
  scale_y_continuous(name = "m") + 
  scale_x_continuous(name = "nR") + 
  xlim(0, 136) +
  facet_grid(order ~ Nm) + 
  ggtitle("Title") +
  stat_bin2d(bins = 50) +
  scale_fill_gradientn(colours = myPalette(100)) +
  theme(legend.position="none") +
  stat_smooth(method = "lm", formula = y ~ poly(x, 2))

out <- by(data = dM, INDICES = list(dM$order, dM$Nm), FUN = function(x) {
  x <- na.omit(x)
  model <- lm(m ~ poly(nR, 2), data = x)
  coef(model) # you will need to explicitly add where the data has come from, see my original post above
})
do.call("rbind", out)

     (Intercept) poly(nR, 2)1 poly(nR, 2)2
[1,]   0.2137094    1.9243960   0.36091764
[2,]   0.1585034    1.7404097  -0.10095676
[3,]   0.2945133    5.2416769   1.54143778
[4,]   0.2569599    3.8638288   0.95676753
[5,]   0.5992618    4.0801451   1.38542308
[6,]   0.5752421    2.4440650   0.08027183
[7,]   0.7034230    1.8407400  -0.20495917
[8,]   0.6411417   -0.3364163  -0.02258523

enter image description here

Wow, this is NICE @Roman, thanks a lot! Do you think your approach can be adapted to get the linear function (linear is ok for now by the way) to fit only up to nR = 50? — Lucien S., Feb 11 '14 at 23:45
What do you mean by 'linear function', @Rodolphe? Once you have the intercept and coefficients for both 'x' terms (`poly(nR, 2)`) you can calculate the fitted values (the line) to any `nR` range you want. Even extrapolate beyond the range that was used to fit the line (dangerous). — Roman Luštrik, Feb 12 '14 at 09:16
Oh sure, I mean that the fitting shouldn't take into account all the dataset, but rather a subset (where nR value don't go further than 50). at the line "model <- lm(m ~ nR, data = x)", could it be something like "m ~ (nR <= 50)"? — Lucien S., Feb 12 '14 at 13:34

Extracting origin and slope values from facet-grided ggplot2 scatter plots [R]

1 Answers1