0
library(readr)
library(ggplot2)
MERGED2014_15_PP <- read_csv("~/Desktop/R/Database Camp - Day 1/CollegeScorecard_Raw_Data/MERGED2014_15_PP.csv")
a1 = subset(MERGED2014_15_PP, STABBR == "AL")
a3 = geom_point(aes(color=factor(CITY)))
a8 = subset(a1, !(SATMTMID=="NULL"), !(SATVRMID=="NULL"))
a9 = ggplot(a8, aes(y = as.numeric(SATVRMID), x = as.numeric(SATMTMID), text = INSTNM, text2 = CITY))
a11 = geom_smooth(method = lm)
a12 = geom_text(aes(label=""))
a9 + a3 + a12 + a11

In the above code, when I attempt to run it no regression line appears. However, I also receive no errors. The data I am using is from the US Department of Education College Scorecard. What could be causing my error and how could I fix it? This code generated the below: Example of R output After cleaning up a8 to only include SATMTMID and SATVRMID, dput() returns the below:

structure(list(SATMTMID = c("420", "565", "590", "430", "565", 
"509", "588", "560", "400", "490", "485", "558", "465", "528", 
"484", "450", "558", "518", "538", "424", "465"), SATVRMID = c("424", 
"570", "595", "425", "555", "486", "575", "560", "420", "510", 
"495", "550", "470", "548", "506", "476", "565", "510", "535", 
"448", "455")), .Names = c("SATMTMID", "SATVRMID"), row.names = c(NA, 
-21L), class = c("tbl_df", "tbl", "data.frame"))
  • can't say without a reproducible example. Sorry. – Ben Bolker Aug 01 '17 at 18:02
  • @BenBolker What could be added to make it more easily reproducible? –  Aug 01 '17 at 18:03
  • See https://stackoverflow.com/help/mcve , https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example (off-topic, but you could leave out most of the packages you've loaded ... you should only need `ggplot2` for this) – Ben Bolker Aug 01 '17 at 18:05
  • posting a link to the image could be a bit of a help in guessing too. – Ben Bolker Aug 01 '17 at 18:06
  • ... and this has nothing to do with rmarkdown, as far as I know. Why did you include that tag ... ? – Ben Bolker Aug 01 '17 at 18:07
  • @BenBolker I am working on creating a smaller sample. I included the Markdown because I am running it in an rmarkdown file. –  Aug 01 '17 at 18:09
  • as part of minimizing the example, take it out of your rmd file ... in fact, eliminate everything (superfluous packages, etc.) that you *don't* think is related to the problem. That way we don't have to do so much guessing. And if one of the omissions makes the problem disappear, then that's a big hint about the problem ... – Ben Bolker Aug 01 '17 at 18:10
  • @BenBolker Thank you for the recommendations! I have removed it from the file and generated the above image from the above code when run at the command line or from a file. –  Aug 01 '17 at 18:20
  • seems like the data set is small enough that you could just `dput()` it and edit your question accordingly? – Ben Bolker Aug 01 '17 at 18:33
  • @BenBolker Will do. –  Aug 01 '17 at 18:43
  • @bouncyball That doesn't resolve the issue. –  Aug 01 '17 at 18:52
  • Can't reproduce with your structure: `ggplot(a8,aes(as.numeric(SATMTMID),as.numeric(SATVRMID)))+ geom_point()+geom_smooth(method=lm)` works fine. Adding `geom_text(aes(label=""))` doesn't make a difference. – Ben Bolker Aug 01 '17 at 21:09
  • @BenBolker Thanks for trying! –  Aug 01 '17 at 22:03

1 Answers1

2

tl;dr I'm not quite sure why, but I think your addition of the text and text2 mappings is messing things up. I'm not sure what these are supposed to do in your real use case ... I think it's probably having the effect of putting every institution in its own, unique group; you could also try adding aes(group=1) to the geom_smooth() specification.

I went and got the data myself. I slightly modified your cleaning pipeline (but this doesn't really do anything different from what you do above ...)

## https://collegescorecard.ed.gov/data/
library(readr)
library(ggplot2)
library(dplyr)
dd <- read_csv("MERGED2014_15_PP.csv")
dd2 <- dd %>%
    filter(STABBR=="AL") %>%
    select(SATMTMID,SATVRMID,INSTNM,CITY) %>%
    mutate(SATMTMID=as.numeric(SATMTMID),
           SATVRMID=as.numeric(SATVRMID),
           CITY=factor(CITY)) %>%
    na.omit %>%
    droplevels %>%
    mutate(CITY=reorder(CITY,SATMTMID))

Now make the plot:

library(ggrepel)
theme_set(theme_bw())  ## my preference
ggplot(dd2,aes(y = SATVRMID, x = SATMTMID))+
    geom_point(aes(color=CITY))+
    geom_smooth(method=lm)+
    geom_text_repel(aes(label=INSTNM,color=CITY))+
    labs(x="median math SAT",y="median verbal SAT")

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you, I will try this when I get to my computer. –  Aug 02 '17 at 11:56
  • This was helpful, but it turns out that the reason was that I wasn't fully cleaning my data. –  Aug 02 '17 at 13:56