0

Is it possible to identify data points above a geom_abline in ggplot, and to create a new data table separating these data points using data.table?

I have a panel dataset with 150 unique ID's, and have fit a fixed effects model using plm(). Here is a sample of the dataset:

data <- data.frame(ID = c(1,1,1,1,2,2,3,3,3),
                   year = c(1,2,3,4,1,2,1,2,3),
                   progenyMean = c(90,78,92,69,86,73,82,85,91),
                   damMean = c(89,89,72,98,95,92,94,87,89)

ID, year, progenyMean, damMean
1, 1, 70, 69
1, 2, 68, 69
1, 3, 72, 72
1, 4, 69, 68
2, 1, 76, 75
2, 2, 73, 80
3, 1, 72, 74
3, 2, 75, 67
3, 3, 71, 69

# Fixed Effects Model in plm
fixed <- plm(progenyMean ~ damMean, data, model= "within", index = c("ID","year"))

I have plotted the response progenyMean vs damMean using the following code:

plotFunction <- function(aggData, year){
  
  ggplot(aggData, aes(x=damMeanCentered, y=progenyMean3Y)) + 
    geom_point() + 
    geom_abline(slope=fixed$coefficients, intercept=71.09, colour='dodgerblue1', size=1)
# The intercept 71.09 was calculated using the mean of fixef(fixed)
  
}

plotFunction(data, '(2005 - 2012)')

enter image description here

Is it possible to identify the points above/below the geom_abline in ggplot and create a new data table separating these data points using data.table?

codemachino
  • 103
  • 9
  • If you already have the equation for your line, it's not a ggplot or data.table operation so much as doing the algebra of whether each point is greater or less than the predicted y-value at its x-value – camille Aug 26 '21 at 16:35
  • For example https://stackoverflow.com/q/62581291/5325862, https://stackoverflow.com/q/12799457/5325862 – camille Aug 26 '21 at 16:42
  • @camille I attempted this method in the post you linked, however it doesn't seem to work (possibly because I'm working with panel data) – codemachino Aug 26 '21 at 16:57
  • I don't see why that would change things (I could be missing something). You have the equation of a line (y = mx + b, filling in whatever your slope and y-intercept are). You have x-values (dam quality), and you have observed y-values (progeny quality). Use that equation to get each x-value's predicted y-value, then check if y_observed > y_predicted – camille Aug 26 '21 at 17:36
  • Why does your intercept say it's 71.09 but in the graph it looks like it is closer to 58? – Dean MacGregor Aug 27 '21 at 11:03
  • @DeanMacGregor The variable on the x-axis has been centered, so the x-axis value 0 is where the intercept hits 71.09 – codemachino Aug 27 '21 at 11:49
  • @codemachino ahhh I spaced it. Thanks. – Dean MacGregor Aug 27 '21 at 11:58

2 Answers2

1

It is not clear where the intercept came from, but nevertheless the trick is add a predict to your dataset using the regression model (in your case fixed). Then filter out actual values that are higher than the predict.

library(dplyr)

data %>%
mutate(predict = predict(fixed, newdata = data)) %>%
filter(progenyMean > predict)
M Daaboul
  • 214
  • 2
  • 4
1

First make the predictions

data[,newpredict:=predict(fixed, newdata=data)]

It's not clear what you want the new data.table to look like but you'd get the values above predictions by doing

data[progencyMean>newpredict]

For below, you'd obviously just change the > to <.

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72