1

This follows on from my last question. I've spent an hour or so trying to work out how to pass the variable I use to filter my dataframe to the title of the graph that is generated.

Following on from my previous questions.

library (tidyverse)
library (epitools)


# here's my made up data

DISEASE = c("Marco Polio","Marco Polio","Marco Polio","Marco Polio","Marco Polio",
            "Mumps","Mumps","Mumps","Mumps","Mumps",
            "Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox")
YEAR = c(2011, 2012, 2013, 2014, 2015,
         2011, 2012, 2013, 2014, 2015,
         2011, 2012, 2013, 2014, 2015)
VALUE = c(82,89,79,51,51,
          79,91,69,89,78,
          71,69,95,61,87)
AREA =c("A", "B","C")

DATA = data.frame(DISEASE, YEAR, VALUE,AREA)

DATA<-
DATA %>%
  mutate(POPN = case_when(
    AREA == "A" ~ 2.5,
    AREA == "B" ~ 3,
    AREA == "C" ~ 7,
    TRUE ~ 0)) %>%
  group_by(DISEASE,AREA,POPN) %>%
  count(AREA) %>%
  mutate(res = list(pois.byar(n, POPN))) %>%
  unnest()

DATA%>%filter(DISEASE== "Marco Polio")%>%
  ggplot(aes(x=AREA, y=rate)) +geom_point() +
  geom_hline(aes(yintercept=rate[AREA == "A"]), 
             linetype="dashed", color = "red")

I thought that this

    x_label = "Area!!!"
    y_label = "Rate!!!"
    DATA%>%filter(DISEASE== "Marco Polio")%>%
          ggplot(aes(x=AREA, y=rate)) +geom_point() +
          geom_hline(aes(yintercept=rate[AREA == "A"]), 
                     linetype="dashed", color = "red")+
labs(x = x_label,y = y_label)+
ggtitle(DATA$DISEASE)

Why doesn't it? It generates a chart for Marco Polio but uses Chicky Pox as the title.

What I want is (false code) ggtitle == filter(disease)

Because what I'm going to do after this is walk and purr to get every chart for every infection and I'd like to title automatically.

Ta.

EDIT: I've tried the suggestion below and it doesn't quite work.

I've tried this

DATA%>%filter(DISEASE== "Mumps")%>%
  ggplot(aes(x=AREA, y=rate)) +geom_point() +
  geom_hline(aes(yintercept=rate[AREA == "A"]), 
             linetype="dashed", color = "red")+
  ggtitle(paste(DISEASE))


DATA%>%filter(DISEASE== "Mumps")%>%
  ggplot(aes(x=AREA, y=rate)) +geom_point() +
  geom_hline(aes(yintercept=rate[AREA == "A"]), 
             linetype="dashed", color = "red")+
  ggtitle(as.character(DISEASE))

and no luck.

Does it have something to do with DISEASE becoming a FACTOR when it gets grouped?

damo
  • 463
  • 4
  • 14

3 Answers3

3

It seems like you want a function where you can input a disease and have the plot created.

disease_plot <- function(disease_of_interest) {
  DATA %>%
    filter(DISEASE == disease_of_interest) %>%
    ggplot(aes(x = AREA, y = rate)) + 
        geom_point() +
        geom_hline(aes(yintercept = rate[AREA == "A"]),
          linetype = "dashed", color = "red") +
        # labs(x = x_label, y = y_label) +
        ggtitle(disease_of_interest)
}

disease_plot("Marco Polio")
disease_plot("Chicky Pox")
disease_plot("Mumps")

Or to have them all created at once...

map(unique(DATA$DISEASE), disease_plot)
Tung
  • 26,371
  • 7
  • 91
  • 115
StephenK
  • 685
  • 5
  • 16
  • Clearly, I need to start looking at how to write functions... I think it's the way of thinking about problems. I appreciate your help and advice here Stephen and Mark.I'm going to show the function I already had and then include the ggtitle portion. I wouldn't have thought to include the "title" in the function, but I also didn't know how to do it. Hope this is ok? And makes sense? – damo Mar 25 '19 at 20:38
  • 1
    Following on from @Tung : I rechecked the answer given here Stephen. It does work. And I think I know why: before running this bit: labs(x = x_label,y = y_label) I should have defined what they were. When it didn't work, and I saw I could use the function I already had I put it together in my answer. I'll mark yours as the answer! Apologies. – damo Mar 26 '19 at 16:13
1

That is because you use the entire DATA$DISEASE as title, and it seems it just grabs the last value in that column. Much simpler is to make a filtered dataframe first, and then feed that into the plot I think.

df <- DATA%>%filter(DISEASE== "Marco Polio")

  ggplot(data = df, aes(x=AREA, y=rate)) +geom_point() +
  geom_hline(aes(yintercept=rate[AREA == "A"]), 
             linetype="dashed", color = "red")+
  labs(x = x_label,y = y_label)+
  ggtitle(df$DISEASE)

In the end I though the solution would be not to call DATA$ but merely DISEASE However, this doesn't seem to work as expected when filtering for another Disease. I think you would have to subset DISEASE also inside the ggtitle, or better use the first function, or the other answer posted by the other user.

NOT WORKING AS EXPECTED:

DATA%>%filter(DISEASE== "Marco Polio")%>%
  ggplot(aes(x=AREA, y=rate)) +geom_point() +
  geom_hline(aes(yintercept=rate[AREA == "A"]), 
             linetype="dashed", color = "red")+
  labs(x = x_label,y = y_label)+
  ggtitle(DISEASE)
Mark
  • 2,789
  • 1
  • 26
  • 66
  • Frustratingly, it works on my trial data (used here). But when I test it on my actual data it doesn't. And I get the following error: Error in rlang::list2(..., title = title, subtitle = subtitle, caption = caption, : object 'DISEASE' not found – damo Mar 25 '19 at 12:17
  • I would suspect a typo or str(df) shows a different format for some columns that differ from the test data and cause the problem? I can not help you with that based on a comment I fear. Please acccept the working answer for the test data though – Mark Mar 25 '19 at 12:21
  • Can I ask where you looked? – damo Mar 25 '19 at 12:33
  • Oh I meant looking as in reading your code, figuring out what we did wrong and changing it till it works. – Mark Mar 25 '19 at 12:55
  • Hmm. Ok. I don't want to turn this in to a massive discussion, but I'll see how I can compare the two grouped dateaframes (test and real data). – damo Mar 25 '19 at 14:33
  • Hmm. I've had to unmark this as an answer, in testing I've changed filter DISEASE to each of the individual diseases and the title isn't changing. – damo Mar 25 '19 at 15:08
  • yeah I tried it too, seems you are right. I think you're better of using the function structure from the other answer. I usually work like that too, was curious to try in with the %>% pipeline structure – Mark Mar 25 '19 at 19:40
1

In the end, I took the advice and help from both Stephen and Mark and cobbled it together with my original plan to walk and purr my way through it.

Here it is:

    walk(unique(DATA$DISEASE), function(disease_of_interest) {
      p <- DATA%>%filter(DISEASE== !!disease_of_interest)%>%
      ggplot(aes(x=AREA, y=rate,y=rate, 
      ymin = rate-lower, ymax = rate+upper))+ 
      geom_point() +
      geom_hline(aes(yintercept=rate[AREA == "A"]), 
                   linetype="dashed", color = "red")+
      labs(x = x_label,y = y_label),+
      ggtitle(paste0("Number of ",disease_of_interest,
      " in 2018"))+
      geom_errorbar(aes(ymin=lower, ymax=upper), width=.1)
      print(p)
      ggsave(paste("drive path",disease_of_interest, "plot.png"))+
      scale_x_discrete(limits=c("C","A","B"))
    })
damo
  • 463
  • 4
  • 14
  • Why did you use `!!` inside your function? – Tung Mar 26 '19 at 06:41
  • Short answer: it's in the function I have inherited to do this. – damo Mar 26 '19 at 09:14
  • Long answer : In dplyr (and in tidyeval in general) you use !! to say that you want to unquote an input so that it’s evaluated, not quoted. This gives us a function that actually does what we want. https://dplyr.tidyverse.org/articles/programming.html . So I guess it makes sure the input on the filter is correct. – damo Mar 26 '19 at 09:15
  • You don't need `!!` for the function to work. `!!` is only needed when you quote your variable using `quo`, `enquo`, `sym`, `ensym`, etc. – Tung Mar 26 '19 at 16:00
  • I tested it before I replied to you: the data I'm using here isn't the data I'm working with. The function I use on the data I'm working with does need the !! (i tested it before replying to your first question). Happy to see what you've got to say - i'm still learning! :) – damo Mar 26 '19 at 16:03
  • The function written by @StephenK worked fine `disease_plot(unique(DATA$DISEASE)[1])` – Tung Mar 26 '19 at 16:03
  • Here are some examples of `ggplot2` with `tidyeval`: https://stackoverflow.com/a/50522928/ & https://stackoverflow.com/a/52296386/ – Tung Mar 26 '19 at 16:09
  • 1
    Thanks Tung, I've explained why I hadn't marked the previous answer as the answer. I appreciate it because, I don't think it's fair that Stephen wouldn't have received any credit. Your suggestions look interesting and thanks for the explanation about !! – damo Mar 26 '19 at 16:15
  • not a problem. I'm glad that you were able to fix your problem – Tung Mar 26 '19 at 23:06