2

I'm trying to create an illustration of a bivariate joint and conditional probability distribution using a contour plot in ggplot2. What I'm trying to recreate is a subset of features of the following image from this site:

enter image description here

(I'm only interested in joint and conditional probabilities, not marginal.) My main issue here is overlaying Gaussians for $y$ given $x=x_0$, as shown in the image. This would require specifying the conditional distributions for $y$ given $x=x_0$ and somehow plotting it on an existing contour plot. So far I couldn't properly generate the line for $x = x_0$, since it does not stretch across the entire y axis.

This is the image generated by my code. You can find the code below the image.

enter image description here

library(MASS)
library(ggplot2)
n <-1e5
x <- mvrnorm(n, mu=c(.5,2.5), Sigma=matrix(c(1,.6,.6,1), ncol=2))
#x <- mvrnorm(n, mu=c(0,0), Sigma=matrix(c(0.8,1,0.8,1), ncol=2))
df = data.frame(x); colnames(df) = c("x","y")

min.x = min(df$x)
max.x = max(df$x)
min.y = min(df$y)
max.y = max(df$y)

commonTheme = list(labs(color="Density",fill="Density",
                        x="x",
                        y="y"),
                   theme_bw(),
                   theme(axis.text.x=element_blank(),
                         axis.ticks.x=element_blank(),
                         axis.text.y=element_blank(),
                         axis.ticks.y=element_blank(),
                         legend.position='none')) 

ggplot(data=df,aes(x,y)) + 
  stat_density2d(aes(fill=..level..,alpha=..level..), geom='polygon', colour='black') + 
  scale_fill_continuous(low="yellow", high="red") +
  guides(alpha="none") +
  geom_line(data=data.frame(x=0.4*max(df$x), y=min(df$y):max(df$y)), size=1.0, colour = "blue")+
  geom_point(data=data.frame(x=0.4*max(df$x), y=0.41*max(df$y)), colour="blue", size=3) +
  #scale_x_continuous(limits = c(min.x, max.x)) +
  #scale_y_continuous(limits = c(min.y, max.y)) +
  commonTheme

Any advice?

Des Grieux
  • 520
  • 1
  • 5
  • 31
  • Strange. In my case all of the code is executed without errors. The values for sigma are largely arbitrary as long as we get a shape resembling the bivariate normal distribution. There are probably more reasonable sigma parameters. – Des Grieux Sep 26 '17 at 05:31
  • if your columns are identical , it's not going to be a full rank matrix. – IRTFM Sep 26 '17 at 05:35
  • [This](https://stats.stackexchange.com/questions/31726/scatterplot-with-contour-heat-overlay) is the original code that I edited. The original code works fine even though the matrix columns are identical. – Des Grieux Sep 26 '17 at 05:38
  • No. They're not equal. That matrix is rank 2. Yours is rank 1. – IRTFM Sep 26 '17 at 05:42
  • You're right. I'll reinstate the values from the original post. – Des Grieux Sep 26 '17 at 05:46
  • 1
    Take a look at this question: https://stackoverflow.com/questions/35717353/split-violin-plot-with-ggplot2/35719046#35719046 - in short calculate all data before plot and plot a polygon – missuse Sep 26 '17 at 07:49

0 Answers0