1

I want do a bubble plot with the regression line from an analysis I did predicting the proportion of votes for Hillary Clinton over Bernie Sanders in each county's democratic primary. However, geom_smooth() keeps making a line with wrong slope and intercept. The lm out put is this:

             Estimate   Std. Error  t-value  p-value
(Intercept)  0.146790   0.058166    2.524    0.011737 *  
AssoCareer  -0.102984   0.020378   -5.054    4.97e-07 ***

But the graph the comes out looking like this:

enter image description here

My code looks like this:

ggplot(data, aes(x=AssoCareer, y=Prop.H, color="green")) +
geom_point(aes(size =Bins, shape="solid",alpha=.2),pch=21, bg='cyan1') + 

geom_text(hjust = 1, size = 2, label=' ') +

coord_cartesian(ylim=c(0,1.5)) +

geom_smooth(method="lm", na.rm=T)+

xlab("County Level Explicit Association Career-Men")+

ylab("Proportion of Hillary Voters")+

ggtitle(paste('Proportion of votes for Clinton over Bernie'))

Can anyone tell why this might be happening?

Frank
  • 66,179
  • 8
  • 96
  • 180
Skowski_P
  • 11
  • 2
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Axeman Sep 12 '16 at 20:19
  • 3
    Judging from your plot an intercept of ~0.14 seems quite unlikely, no? That's basically at the center bottom of the point cloud. – Axeman Sep 12 '16 at 20:24
  • is the summary of lm() generated with na.action=na.omit (the default option)? you can check if the NA values are the culprits, by filtering the data with complete.cases() first and then executing both the base lm() and ggplot2 stat_smooth() / geom_smooth, to see if they compute the same coefficients. – Sandipan Dey Sep 12 '16 at 20:39

0 Answers0