1

I'm trying to create a stacked geom_bar plot of the cumulative number of sessions by date (per month) by group. For some reason even though my x variable dates starts at 2016-11-01 and ends at 2019-02-01 for both groups the plot is starting at 2015-12-01 (Dec-2015) and the values are all clumping together at Jan-16, Jan-17... etc.

When my dates were characters it was working, but then I couldn't reorder. So I changed them to dates, but are now having the above issue.

here is the dput() of my data imported from an initial csv file

recruitment_tally<-structure(list(dates = structure(c(16811, 16812, 17167, 17168, 
                                   17169, 17170, 17171, 17172, 17173, 17174, 17175, 17176, 17177, 
                                   17178, 17532, 17533, 17534, 17535, 17536, 17537, 17538, 17539, 
                                   17540, 17541, 17542, 17543, 17897, 17898, 17899, 16811, 16812, 
                                   17167, 17168, 17169, 17170, 17171, 17172, 17173, 17174, 17175, 
                                   17176, 17177, 17178, 17532, 17533, 17534, 17535, 17536, 17537, 
                                   17538, 17539, 17540, 17541, 17542, 17543, 17897, 17898, 17899
), class = "Date"), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
                                        1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                        1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                        2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                        2L, 2L, 2L, 2L), .Label = c("control", "mtbi"), class = "factor"), 
total_sessions = c(4, 8, 11, 15, 19, 21, 27, 33, 35, 38, 
                   41, 44, 47, 48, 51, 53, 56, 58, 59, 62, 63, 63, 66, 67, 69, 
                   70, 71, 72, 73, 0, 0, 0, 2, 3, 5, 8, 10, 15, 18, 20, 27, 
                   28, 28, 32, 34, 36, 36, 39, 41, 41, 43, 49, 50, 53, 57, 58, 
                   60, 63)), row.names = c(NA, -58L), spec = structure(list(
                     cols = list(date = structure(list(), class = c("collector_character", 
                                                                    "collector")), group = structure(list(), class = c("collector_character", 
                                                                                                                       "collector")), culm_total = structure(list(), class = c("collector_double", 
                                                                                                                                                                               "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                     "collector"))), class = "col_spec"), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                    "tbl", "data.frame"))

here is my ggplot code

library(ggplot2)

base<- recruitment_tally %>%
        ggplot()+
        geom_bar(aes(y = total_sessions, x= dates, fill = group), 
        stat="identity",position="dodge") +
        coord_flip()



base + scale_x_date(date_breaks = "month", date_labels = "%b%y")

thanks very much for your help!

Community
  • 1
  • 1
hc999
  • 13
  • 4

1 Answers1

0

I think what has happened here is that the dates are not as expected after CSV import.

The dates in your example data seem to be the first 12 days of each month. I assume that what you want is the first day for each of the 12 months of the year. I suspect that somewhere along the way, dates in year-day-month format became year-month-day.

You can fix this using your data like this:

recruitment_tally %>% 
  mutate(dates = as.Date(as.character(dates), "%Y-%d-%m")) %>% 
  ggplot(aes(dates, total_sessions)) + 
    geom_col(aes(fill = group)) + 
    coord_flip() + 
    scale_x_date(date_labels = "%b %Y")

enter image description here

But the better fix is to get the date format correct when importing the data.

neilfws
  • 32,751
  • 5
  • 50
  • 63
  • Thank you!!! :) Just for my additional learning, is there a way to check this?? I would usually just use class(dates) which would just returns date, so not the format of the date. Or did you just identify the problem from looking at the code? – hc999 Mar 27 '19 at 01:12
  • 1
    @hc999: you can use `str()` or `summary()` to have an overview of the imported data or even better use one of the packages from this [answer](https://stackoverflow.com/a/52345926/786542) – Tung Mar 27 '19 at 02:08
  • @hc999 I figured from your description of the problem and the example data that days and months were switched, since your dates were year-month-day but had no days greater than 12. You need to know in advance whether your dates are YYYY-MM-DD or YYYY-DD-MM. Then you can specify the correct format using arguments to _e.g._ `read.csv` or better, `readr::read_csv`. – neilfws Mar 27 '19 at 02:27