0

I am working with the R programming language.

I have a data file (called "my_data") with a date variable ("my_date") that is in a DAY-MONTH-YEAR format, and the dates are in "factor" format. The dates look like this : 05-OCT-21

I am trying to make a time series plot of this data, in which I count the total number of observations in each month over a set of years, grouped by groupings in another variable ("group_var"). I tried to do this using the "dplyr" library:

library(dplyr)
library(ggplot2)

new <- my_data %>%
mutate(date = as.Date(my_date)) %>%
group_by(group_var, month = format(date, "%Y-%m")) %>%
summarise( count = n())

plot <- ggplot(new) + geom_line(aes(x = month, y = count, color = group_var, group = group_var)) + scale_colour_manual(values = c("red", "green", "blue")) + theme(axis.text.x = element_text(angle = 90)) + ggtitle("my title")

Can someone please show me what I am doing wrong?

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • 1
    Your `as.Date` should be `as.Date(my_date, format = "%d-%b-%y")` based on the format showed. i.e. `as.Date("05-OCT-21", "%d-%b-%y") [1] "2021-10-05"` – akrun Jan 18 '22 at 21:07
  • @ akrun : should the as.Date statement be run in the dplyr statement? – stats_noob Jan 18 '22 at 21:09
  • What *is* going wrong? I see no errors, and without sample data we cannot reproduce what you are seeing on your console. Is this a problem with `dplyr` or `ggplot2`? Please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for tips on making this question more reproducible. (Many users ignore that advice/request and keep asking questions. Please read them. Then at a minimum include sample data with `dput(.)` and the literal error text you see. Or a screenshot of the image if it is broken.) – r2evans Jan 18 '22 at 23:24

0 Answers0