1

I have a dataset containing 1,000 values for a model, these values are all within the same range (y=40-70), so the points overlap a ton. I'm interested in using color to show the density of the points converging on a single value (y=56.72) which I have indicated with a horizontal dashed line on the plot below. How can I color these points to show this?

ggplot(data, aes(x=model, y=value))+ 
geom_point(size=1) + 
geom_hline(yintercept=56.72, 
           linetype="dashed", 
            color = "black")

enter image description here

denis
  • 5,580
  • 1
  • 13
  • 40
mef022
  • 45
  • 2
  • hi, I have no access to R right now ... but as a start you could do something like count the values on certain points and then use the count as fill or something like that :) – sambold Jul 03 '20 at 20:05
  • See also [my answer here](https://stackoverflow.com/a/58523956/1870254) on how to easily color points by an estimated density. – jan-glx Sep 25 '20 at 13:36

2 Answers2

2

I think that you should opt for an histogram or density plot:

n <- 500
data <- data.frame(model= rep("model",n),value =  rnorm(n,56.72,10))

ggplot(data, aes(x = value, y = after_stat(count))) +
  geom_histogram(binwidth = 1)+
  geom_density(size = 1)+
  geom_vline(xintercept = 56.72, linetype = "dashed", color = "black")+
  theme_bw()

enter image description here

Here is your plot with the same data:

ggplot(data, aes(x = model, y = value))+ 
  geom_point(size = 1) + 
  geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")

enter image description here

If your model is iterative and do converge to the value, I suggest you plot as a function of the iteration to show the convergence. An other option, keeping a similar plot to your, is dodging the position of the points :

ggplot(data, aes(x = model, y = value))+ 
  geom_point(position = position_dodge2(width = 0.2),
             shape = 1,
             size = 2,
             stroke = 1,
             alpha = 0.5) + 
  geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")

enter image description here

Here is a color density plot as you asked:

library(dplyr)
library(ggplot2)
data %>%
  mutate(bin = cut(value, breaks = 10:120)) %>%
  dplyr::group_by(bin) %>%
  mutate(density = dplyr::n()) %>%
  ggplot(aes(x = model, y = value, color = density))+ 
  geom_point(size = 1) + 
  geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")+
  scale_colour_viridis_c(option = "A")

enter image description here

jan-glx
  • 7,611
  • 2
  • 43
  • 63
denis
  • 5,580
  • 1
  • 13
  • 40
  • Thanks for all of the options @denis! The color density throws an error: Error: `n()` must only be used inside dplyr verbs. Is there a work around this? – mef022 Jul 03 '20 at 23:45
  • yes, use `dplyr::n()`. It is because you have `plyr` loaded too. – denis Jul 04 '20 at 07:57
0

I would suggest to use the alpha parameter within the geom_point. You should use a value close to 0.

ggplot(data, aes(x=model, y=value)) + 
  geom_point(size=1, alpha = .1) + 
  geom_hline(yintercept=56.72, linetype="dashed", color = "black")
eastclintw00d
  • 2,250
  • 1
  • 9
  • 18