1

I am really new to use R. So I am having a problem to visualize data using ggplot2 package in R. I would like to create a linear regression graph in which the points within the specific area have the same color and the points outside that area have the same color. Also, I would like to change the background within the specific area to focus on that area as well.

The graph I would like to make will be similar like the below graph.

Target graph

But until now, I only could create the below simple graph.

My current graph

My code to generate the current graph is below.

g  <- ggplot(df, aes(x = real, y = predicted)) 
g + geom_point() +
geom_abline(intercept = 0, slope = 1, color='black') +
theme_classic() +
geom_abline(intercept = 0+s_est, slope = 1, color = 'darkgrey')+
geom_abline(intercept = 0-s_est, slope = 1, color = 'darkgrey') +
ggtitle("Test Set")

The first 100 lines of data are as follows.

structure(list(real = c(3.33, 5.92, 5.3, 6, 6.96, 7.03, 6.6, 
7.92, 8.3, 10.52, 6.34, 4.38, 4.59, 9.8, 10.3, 10, 8.25, 6, 7.44, 
6.66, 9.09, 9.22, 9.7, 4.82, 6.1, 4.92, 4.29, 3.22, 6.01, 9.05, 
9.04, 4.85, 8.22, 6.7, 6.7, 4.62, 4.82, 8.52, 5.24, 8.15, 7, 
10, 7, 5.18, 5.93, 8.4, 7.7, 7.24, 9.54, 6.06, 8, 4.35, 4.2, 
4.51, 2.48, 9.1, 5.34, 4.19, 8.05, 8.55, 6.55, 11.4, 10.96, 9.64, 
4.49, 6, 6.9, 6.17, 9, 6.92, 3.77, 4.22, 8.92, 7.55, 7.6, 6.82, 
5.32, 8.39, 5.09, 10.96, 6.68, 9.4, 5.04, 5.59, 9.21, 9.7, 6.98, 
6.17, 8.89, 9.74, 6.08, 6.7, 4.41, 3.57, 7.12, 6.09, 6.11, 6.82, 
7.3, 6.77), predicted = c(3.3049898147583, 7.57794666290283, 
5.81329345703125, 3.71067190170288, 6.35026741027832, 6.59200620651245, 
6.32752990722656, 7.13449430465698, 7.78791570663452, 8.61589622497559, 
7.72269868850708, 5.33322525024414, 7.26069974899292, 9.23727989196777, 
8.27904891967773, 7.55226612091064, 5.94742393493652, 4.07633399963379, 
7.67468595504761, 5.64575576782227, 7.85368394851685, 7.73117685317993, 
10.2843132019043, 4.96891403198242, 6.29262351989746, 6.03091764450073, 
6.71697568893433, 3.50744342803955, 6.46608829498291, 8.20327758789062, 
7.52885150909424, 4.58155632019043, 6.1530909538269, 6.49482202529907, 
5.28225088119507, 4.44094896316528, 5.503089427948, 7.79408073425293, 
5.6220269203186, 7.12402009963989, 6.30716276168823, 7.15596580505371, 
7.26271867752075, 5.41359615325928, 5.68268489837646, 6.81329536437988, 
7.10254955291748, 8.64251136779785, 8.65674114227295, 5.94885206222534, 
9.24687099456787, 5.93400239944458, 5.66134691238403, 6.14793062210083, 
2.94440221786499, 9.21078777313232, 5.96825170516968, 4.69157028198242, 
7.91313886642456, 6.90836668014526, 6.72082805633545, 9.95611953735352, 
9.15732383728027, 6.68948268890381, 3.60811305046082, 7.42742109298706, 
6.05647945404053, 6.2350025177002, 8.12950134277344, 7.56590843200684, 
5.3975772857666, 3.48417925834656, 7.63604927062988, 8.04048824310303, 
7.78053188323975, 7.34217929840088, 7.93345308303833, 8.03125, 
5.62498426437378, 4.80621385574341, 5.19631958007812, 7.51661252975464, 
5.43919944763184, 5.5195426940918, 6.10152912139893, 8.25357818603516, 
5.73111486434937, 7.27180528640747, 8.37008285522461, 7.78157567977905, 
7.52273559570312, 4.32158374786377, 6.20211696624756, 4.30103015899658, 
7.89811611175537, 6.88143062591553, 6.74230575561523, 6.75651741027832, 
6.64747190475464, 6.72232007980347)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -100L))
s_est = 4.536

Thank you so much for any help.

stefan
  • 90,330
  • 6
  • 25
  • 51
christine
  • 15
  • 4
  • 1
    Welcome to SO, Christine! It would be easier to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data so that we can run your code. – stefan Nov 29 '22 at 07:25
  • 1
    Thank you for your suggestion. I will modify my question and attach the data. Sorry for my improper question. – christine Nov 30 '22 at 02:26
  • Everything fine. To get you started: To share your data in a reproducible fashion type `dput(head(df, 20))` (for the first 20 rows of your data) in the R console and copy the output as an edit into your original post. Also include the value of `s_est` in your post. – stefan Nov 30 '22 at 07:11
  • 1
    Thank you, Stefan, for your detailed instructions. Now I could update my question with the data. Hopefully, I will receive some suggestions or instructions from the SO community. – christine Dec 01 '22 at 02:29

1 Answers1

0

In your target image it looks like the points are colored by a measure of the absolute error, where points which fall inside the confidence (?) band are colored blue and points which fall outside are colored red. To achieve the same result you could map the absolute error (or whatever measure you prefer) on the color aesthetic. To get the coloring right I use a scale_color_gradient2 where I have set the midpoint to s_est. However, I set an upper bound for the color gradient, i.e. values with an abs error greater or equal to 2 * s_est are assigned the same "red" color. But you you could adjust that if you like.

To get a shading for the area between your ablines I first get rid of your geom_ablines and use a geom_ribbon instead. One drawback is that the ribbon will not extend to the axis but is restricted to the data range. To "fix" that I use a small hack, i.e. I use a separate dataset where I extend the range of real values slightly by 5% of the data range and additionally get rid of the default expansion of the x-scale.

Finally I added a coord_equal to equalize the range or the limits of both scales.

Note: I used a smaller value for s_est as for the example data no value would have fallen outside of the confidence band.

library(ggplot2)

s_est <- 4.536 / 4

# Absolute Error
df$resid <- abs(df$predicted - df$real)

# Range of "real" values used for the ribbon. Manually expand range by 5%
range_ribbon <- diff(range(df$real))
range_ribbon <- range(df$real) + .05 * range_ribbon * c(-1, 1)

ggplot(df, aes(x = real, y = predicted)) +
  geom_point(aes(color = resid)) +
  geom_abline(intercept = 0, slope = 1, color = "black") +
  geom_ribbon(
    data = data.frame(real = range_ribbon, predicted = 0),
    aes(ymin = real - s_est, ymax = real + s_est),
    color = "darkgrey", fill = "darkgrey", alpha = .2
  ) +
  # Remove default expansion of the x scale
  scale_x_continuous(expand = c(0, 0)) +
  # Color gradient. Limit range to 2 * s_est
  scale_color_gradient2(
    midpoint = s_est, low = "blue", high = "red",
    limits = c(0, 2 * s_est),
    oob = scales::oob_squish
  ) +
  labs(title = "Test Set") +
  coord_equal()

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • 1
    Thank you so much for spending time helping me with the absolutely perfect solution with a careful explanation for my issue. I learned a lot from your instruction. Once again, thank you so so much!!! – christine Dec 02 '22 at 02:58