2

I am trying to display percentages in ggplot2 using geom_line and geom_point. My code is:

print(ggplot(data=dfComb, aes(x=hour_adj, y=(..count..)/sum(..count..), group=word)) +
        geom_line(aes(colour=dfComb$word)) +
        geom_point(aes(colour=dfComb$word))
      +   ggtitle(paste("Hourly Frequencies of Tweets")) +
        xlab("Hour of Day") +
        ylab("Count") +
        scale_colour_discrete("Word", breaks=c("A","B"), labels=c("yid", "abbo")) +
        scale_y_continuous(labels = scales::percent)
        )

This errors:

Error in FUN(X[[i]], ...) : object 'count' not found

because the ..count.. variable is only created by geom_histogram (I think!) and not geom_line. Is there an easy way to use percentages with geom_line?

FYI: EDIT, my data is:

dput(dfComb)
structure(list(hour_adj = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 
22L, 23L, 0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L), count = c(44, 
24, 22, 36, 26, 18, 39, 35, 50, 46, 46, 41, 57, 49, 34, 56, 54, 
54, 49, 45, 36, 49, 43, 47, 35, 20, 18, 10, 10, 25, 25, 26, 32, 
25, 29, 39, 37, 45, 52, 43, 46, 67, 38, 69, 108, 80, 73, 48), 
    word = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor")), .Names = c("hour_adj", 
"count", "word"), row.names = c(NA, -48L), class = "data.frame")
schoon
  • 2,858
  • 3
  • 46
  • 78
  • Can you provide a reproducible example? Consider using the correct syntax by not explicitly calling variables from objects, but only by their name (just like in `aes`). – Roman Luštrik Sep 03 '17 at 07:40

1 Answers1

6

You can calculate percentage in the dataframe first.

Also, as per Roman Lustrik's comment, it's better to call variable by name from within aes().

library(dplyr)

# sample data
set.seed(1)
dfComb <- data.frame(hour_adj = rep(0:4, 2),
                     count = sample(10:50, 10, replace = T),
                     word = c(rep("A", 5), rep("B", 5)))

ggplot(dfComb %>%
         group_by(word) %>%
         mutate(perc = count/sum(count)) %>%
         ungroup(), 
       aes(x=hour_adj, y=perc, group=word, colour = word)) +
  geom_line() +
  geom_point() + 
  ggtitle(paste("Hourly Frequencies of Tweets")) +
  xlab("Hour of Day") +
  ylab("Count") +
  scale_colour_discrete("Word", breaks=c("A","B"), labels=c("yid", "abbo")) +
  scale_y_continuous(labels = scales::percent)

ggplot

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • Thanks. Can you clarify what you mean by ' it's better to call variable by name from within aes()'? – schoon Sep 03 '17 at 08:14
  • Also I get this errro when I run your code: `Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘group_by’ for signature ‘"data.frame"’` – schoon Sep 03 '17 at 08:17
  • 1
    `group_by` is a function from the dplyr package, part of Hadley's tidyverse packages along with ggplot2. I'll edit my answer to include the package. It looks like your dataset currently does not have data.frame as one of its classes. You can try to coerce it using `as.data.frame()`. If that doesn't work, do consider including a sample of your actual dataset using `dput()` so that we can see what it is. – Z.Lin Sep 03 '17 at 08:21
  • Also, check out baptiste's answer & example [here](https://stackoverflow.com/a/32543753/8449629) on why it's a bad idea to use `$` in your `aes()` calls. You've already indicated the dataset as your first argument; let the geoms use variables in the same order. – Z.Lin Sep 03 '17 at 08:29
  • No need. I should have read your comment more closely `library(dplyr)` fixed it. Thank you so much! – schoon Sep 03 '17 at 09:35