0

I have a web visits over time chart which plots daily traffic from 2014 until now, and looks like this:

 ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
   geom_line()+
   scale_y_continuous(labels = comma)+
   ylim(0,50000)

enter image description here

As you can see it's not a great graph, what would make a bit more sense is to break it down by month as opposed to day. However when I try this code:

 ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
   geom_line()+
   scale_y_continuous(labels = comma)+
   ylim(0,50000)+
   scale_x_date(date_breaks = "1 month", minor_breaks = "1 week", labels = date_format("%B"))

I get this error:

Error: Invalid input: date_trans works with objects of class Date only

The date field Post_Day is POSIXct. Page_Views is numeric. Data looks like:

Post_Title  Post_Day    Page_Views
Title 1     2016-05-15  139
Title 2     2016-05-15  61
Title 3     2016-05-15  79
Title 4     2016-05-16  125
Title 5     2016-05-17  374
Title 6     2016-05-17  39
Title 7     2016-05-17  464
Title 8     2016-05-17  319
Title 9     2016-05-18  84
Title 10    2016-05-18  64
Title 11    2016-05-19  433
Title 12    2016-05-19  418
Title 13    2016-05-19  124
Title 14    2016-05-19  422

I'm looking to change the X axis from a daily granularity into monthly.

Uwe
  • 41,420
  • 11
  • 90
  • 134
jceg316
  • 469
  • 1
  • 9
  • 17
  • 1
    Isn't this really a question about how to aggregate dates into months, rather than how to change the x axis? – C8H10N4O2 Jun 23 '17 at 13:58
  • @C8H10N4O2 yes I suppose so. I thought that's what I was doing with `scale_x_date()` but it doesn't seem to be working. – jceg316 Jun 23 '17 at 14:02
  • The use of `Post_Day > "2013-12-31"` indicates that Post_Day is a character variable. You should convert this variable to a date class using `as.Date` Then you'd use `Post_Day > as.Date("2013-12-31")`. – lmo Jun 23 '17 at 14:04
  • you could create columns for `year(post_day)` and `month(post_day)` or you could use one of [these approaches](https://stackoverflow.com/questions/23602706/first-day-of-the-month-from-a-posixct-date-time-using-lubridate) to take the "month floor" of the date – C8H10N4O2 Jun 23 '17 at 14:04
  • @lmo R is smart enough to handle the comparison `Post_Day > "2013-12-31"` when `Post_Day` is a `Date` – C8H10N4O2 Jun 23 '17 at 14:05
  • @Imo thanks for the suggestion however it didn't work. – jceg316 Jun 23 '17 at 14:08

2 Answers2

1

The sample data set shown in the question has multiple data points per day. So, it needs to be aggregated day-wise anyway. For the aggregation by day or month, data.table and lubridate are used.

Create sample data

As no reproducible example is supplied, a sample data set is created:

library(data.table)
n_rows <- 5000L
n_days <- 365L*3L
set.seed(123L)
DT <- data.table(Post_Title = paste("Title", 1:n_rows),
                 Post_Day = as.Date("2014-01-01") + sample(0:n_days, n_rows, replace = TRUE),
                 Page_Views = round(abs(rnorm(n_rows, 500, 200))))[order(Post_Day)]
DT
      Post_Title   Post_Day Page_Views
   1:   Title 74 2014-01-01        536
   2:  Title 478 2014-01-01        465
   3: Title 3934 2014-01-01        289
   4: Title 4136 2014-01-01        555
   5:  Title 740 2014-01-02        442
  ---                                 
4996: Title 1478 2016-12-31        586
4997: Title 2251 2016-12-31        467
4998: Title 2647 2016-12-31        468
4999: Title 3243 2016-12-31        498
5000: Title 4302 2016-12-31        309

Plot raw data

Without aggregation the data can be plotted by

library(ggplot2)
ggplot(DT) + aes(Post_Day, Page_Views) + geom_line()

enter image description here

Aggregated by day

ggplot(DT[, .(Page_Views = sum(Page_Views)), by = Post_Day]) + 
  aes(Post_Day, Page_Views) + geom_line()

To aggregate day-wise the grouping parameter by of data.table is used and sum() as aggregation function. The aggregation is reducing the number of data points from 5000 to 1087. Hence, the plot looks less convoluted.

enter image description here

Aggregated by month

ggplot(DT[, .(Page_Views = sum(Page_Views)), 
          by = .(Post_Month = lubridate::floor_date(Post_Day, "month"))]) + 
  aes(Post_Month, Page_Views) + geom_line()

In order to aggregate by month, the grouping parameter by is used but this time Post_Day is mapped to the first day of the respective months. So, 2014-03-26 becomes a Post_Month of 2014-03-01 which is still of class POSIXct. By this, the x-axis remains continuous with a date scale. This avoids the trouble when converting Post_Day to factor, e.g, "2014-03" using format(Post_Day, ""%Y-%m"), where the x-axis would become discrete.

enter image description here

Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • I am not sure this is the correct solution to the problem. Sounds like the question is about changing the labels on the axis, not aggregating the data. ggplot2::scale_x_date also has scale_x_datetime options - having the posixct date as a date-time rather than a date would give the error shown, and is easily fixed by changing scale_x_date to scale_x_datetime – LucieCBurgess Oct 15 '20 at 14:34
  • @LucieCBurgess Using `scale_x_datetime()` instead of `scale_x_date()` will make the error message go away but will *not* solve the underlying problem. The misunderstanding is that changing the scale to month would automatically aggregate the data by month as well - which it does not. In OP's words *As you can see it's not a great graph, what would make a bit more sense is to break it down by month as opposed to day.* OP's data contain multiple entries per day which need to be aggregated by day, anyway. Apparently, the OP was happy with my answer and has accepted it. – Uwe Oct 16 '20 at 16:35
0
APRA$month <- as.factor(stftime(APRA$Post_Day, "%m")
APRA       <- APRA[order(as.numeric(APRA$month)),]

This would create a month column to your data

z <- apply(split(APRA, APRA$month), function(x) {sum(as.numeric(APRA$Page_Views))})
z <- do.call(rbind, z)
z$month <- unique(APRA$month)
colnames(Z) <- c("Page_Views", "month")

This would create a z dataframe which has months and page views each month

Now plot it

ggplot(z, aes(x = month, y = Page_Views)) + geom_line()

Please let me know if this is what you were looking for. Also I haven't compiled it, please tell if it throws some error.

Kalees Waran
  • 659
  • 6
  • 13