Creating data table of points above/below abline in ggplot2

Question

Is it possible to identify data points above a geom_abline in ggplot, and to create a new data table separating these data points using data.table?

I have a panel dataset with 150 unique ID's, and have fit a fixed effects model using plm(). Here is a sample of the dataset:

data <- data.frame(ID = c(1,1,1,1,2,2,3,3,3),
                   year = c(1,2,3,4,1,2,1,2,3),
                   progenyMean = c(90,78,92,69,86,73,82,85,91),
                   damMean = c(89,89,72,98,95,92,94,87,89)

ID, year, progenyMean, damMean
1, 1, 70, 69
1, 2, 68, 69
1, 3, 72, 72
1, 4, 69, 68
2, 1, 76, 75
2, 2, 73, 80
3, 1, 72, 74
3, 2, 75, 67
3, 3, 71, 69

# Fixed Effects Model in plm
fixed <- plm(progenyMean ~ damMean, data, model= "within", index = c("ID","year"))

I have plotted the response progenyMean vs damMean using the following code:

plotFunction <- function(aggData, year){
  
  ggplot(aggData, aes(x=damMeanCentered, y=progenyMean3Y)) + 
    geom_point() + 
    geom_abline(slope=fixed$coefficients, intercept=71.09, colour='dodgerblue1', size=1)
# The intercept 71.09 was calculated using the mean of fixef(fixed)
  
}

plotFunction(data, '(2005 - 2012)')

Is it possible to identify the points above/below the geom_abline in ggplot and create a new data table separating these data points using data.table?

If you already have the equation for your line, it's not a ggplot or data.table operation so much as doing the algebra of whether each point is greater or less than the predicted y-value at its x-value — camille, Aug 26 '21 at 16:35
For example https://stackoverflow.com/q/62581291/5325862, https://stackoverflow.com/q/12799457/5325862 — camille, Aug 26 '21 at 16:42
@camille I attempted this method in the post you linked, however it doesn't seem to work (possibly because I'm working with panel data) — codemachino, Aug 26 '21 at 16:57
I don't see why that would change things (I could be missing something). You have the equation of a line (y = mx + b, filling in whatever your slope and y-intercept are). You have x-values (dam quality), and you have observed y-values (progeny quality). Use that equation to get each x-value's predicted y-value, then check if y_observed > y_predicted — camille, Aug 26 '21 at 17:36
Why does your intercept say it's 71.09 but in the graph it looks like it is closer to 58? — Dean MacGregor, Aug 27 '21 at 11:03
@DeanMacGregor The variable on the x-axis has been centered, so the x-axis value 0 is where the intercept hits 71.09 — codemachino, Aug 27 '21 at 11:49

score 1 · Answer 1 · answered Aug 26 '21 at 15:02

1

It is not clear where the intercept came from, but nevertheless the trick is add a predict to your dataset using the regression model (in your case fixed). Then filter out actual values that are higher than the predict.

library(dplyr)

data %>%
mutate(predict = predict(fixed, newdata = data)) %>%
filter(progenyMean > predict)

answered Aug 26 '21 at 15:02

M Daaboul

214
2
4

This seems to produce an error message: Problem with `mutate()` column `predict`. ℹ `predict = predict(fixed, newdata = period1_above)`. x non-conformable arguments – codemachino Aug 26 '21 at 15:08
Is it possible to do this using `data.table` instead of `dplyr` ? – codemachino Aug 26 '21 at 15:08
Refresh data with your first line of code before using: – M Daaboul Aug 26 '21 at 17:31

score 1 · Answer 2 · answered Aug 27 '21 at 11:08

First make the predictions

data[,newpredict:=predict(fixed, newdata=data)]

It's not clear what you want the new data.table to look like but you'd get the values above predictions by doing

data[progencyMean>newpredict]

For below, you'd obviously just change the > to <.

Creating data table of points above/below abline in ggplot2

2 Answers2