1

I have a scatterplot of x versus y. I have drawn an abline down the middle of the plot. I want to calculate the variance of the points on the left of the abline and I want to calculate the variance of the points on the right of the abline. This is most likely a relatively simple problem, but I'm struggling to find a solution. Any advice is appreciated. Thanks in advance.

    x = rnorm(100,mean=12,sd=2)
    y = rnorm(100,mean=20,sd=5)
    data = as.data.frame(cbind(x,y))
    plot(x=x,y=y,type="p")
    abline(v=12,col="red")
tesseracT
  • 69
  • 6
  • 1
    Please read: [reproducible examples](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and [help/mcve](http://stackoverflow.com/help/mcve). – r2evans Oct 05 '16 at 04:02
  • Take a look at `?resid` - `resid(lm(y ~ x)) > 0` for instance for defining groups on either side of a line of best fit. Maybe - `tapply(y, resid(lm(y~x))>0, var)` but you're going to have to clarify exactly what it is you want. – thelatemail Oct 05 '16 at 04:03
  • I apologize, I have included some example code. When looking at the plot, my goal is to take points that are to the left of the red line and find the variance. Then I want to find the variance of the points on the right side of the line so that I can compare the variances. – tesseracT Oct 05 '16 at 04:17

1 Answers1

4

In your sample code you have a vertical line v = 12. Your data points (x, y) are split into two groups as x < 12 and x >= 12. It is straightforward to do something like:

var(y[x < 12])
var(y[x >= 12])

But we can also use a single call to tapply:

tapply(y, x < 12, FUN = var)

More generally if you have a line y = a * x + b, where a is slope and b is intercept, your data points (x, y) will be split into two groups: y < a * x + b (below the line) and y >= a * x + b (above the line), so that you may use

tapply(y, y < a * x + b, FUN = var)
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248