2

I currently have a ggplot2 chart which looks like:

enter image description here

The dataframe looks like:

           city weekday     time avg_wait_time           timestamp
1511 Boston Tuesday 09:06 AM          0.20 2001-01-04 09:06:00
1512 Boston Tuesday 09:07 AM          0.20 2001-01-04 09:07:00
1513 Boston Tuesday 09:08 AM          0.20 2001-01-04 09:08:00
1514 Boston Tuesday 09:09 AM          3.17 2001-01-04 09:09:00
1515 Boston Tuesday 09:10 AM          3.17 2001-01-04 09:10:00
1516 Boston Tuesday 09:11 AM          3.17 2001-01-04 09:11:00
1517 Boston Tuesday 09:12 AM          3.17 2001-01-04 09:12:00
1518 Boston Tuesday 09:13 AM          5.87 2001-01-04 09:13:00
1519 Boston Tuesday 09:14 AM          5.87 2001-01-04 09:14:00
1520 Boston Tuesday 09:15 AM          5.87 2001-01-04 09:15:00
1521 Boston Tuesday 09:16 AM          5.87 2001-01-04 09:16:00
1522 Boston Tuesday 09:17 AM          9.17 2001-01-04 09:17:00
1523 Boston Tuesday 09:18 AM          9.17 2001-01-04 09:18:00
1524 Boston Tuesday 09:19 AM         12.20 2001-01-04 09:19:00
1525 Boston Tuesday 09:20 AM         12.20 2001-01-04 09:20:00
1526 Boston Tuesday 09:21 AM         12.10 2001-01-04 09:21:00
1527 Boston Tuesday 09:23 AM         13.70 2001-01-04 09:23:00
1528 Boston Tuesday 09:24 AM         13.70 2001-01-04 09:24:00
1529 Boston Tuesday 09:25 AM         15.30 2001-01-04 09:25:00
1530 Boston Tuesday 09:26 AM         15.30 2001-01-04 09:26:00
1531 Boston Tuesday 09:27 AM         16.90 2001-01-04 09:27:00
1532 Boston Tuesday 09:28 AM         16.90 2001-01-04 09:28:00
1533 Boston Tuesday 09:29 AM         18.33 2001-01-04 09:29:00
1534 Boston Tuesday 09:30 AM         18.33 2001-01-04 09:30:00
1535 Boston Tuesday 09:31 AM         16.90 2001-01-04 09:31:00
1536 Boston Tuesday 09:32 AM         16.90 2001-01-04 09:32:00
1537 Boston Tuesday 09:33 AM         18.57 2001-01-04 09:33:00
1538 Boston Tuesday 09:34 AM         18.57 2001-01-04 09:34:00
1539 Boston Tuesday 09:35 AM         21.73 2001-01-04 09:35:00
1540 Boston Tuesday 09:36 AM         21.73 2001-01-04 09:36:00

The classes for this dataframe are:

> sapply(x_output, class)
$city
[1] "factor"

$weekday
[1] "factor"

$time
[1] "character"

$avg_wait_time
[1] "numeric"

$timestamp
[1] "POSIXct" "POSIXt" 

As you can see from the graphs, there is too much whitespace because ggplot2 is plotting a 24 hour day. In this dataset, the data is a one business-week time period (2001-01-01 to 2001-01-05) (an "average" week with these specific dates arbitrarily set) and business hours are between 9am and 6pm.

How can I make ggplot2 plot between 9am and 6pm exclusively?

Here's my attempt:

p <- ggplot(x_output, aes(x=timestamp, y=avg_wait_time, group=city)) +
  geom_line(aes(color=city), size=1.5) + 
  theme(axis.text.x = element_text(angle = 90, hjust=1), 
        legend.position = "bottom") + 
  labs(x=NULL, y="Waiting time (minutes)") + 
  facet_wrap( ~ weekday, ncol=5) + 
  scale_x_datetime(breaks = date_breaks("1 hour")), 
                   limits = c(as.POSIXct("9:00"), as.POSIXct("18:00"))

print(p)

I think if I can just configure the limits part correctly, the ggplot2 will plot properly. Thanks for your help.

UPDATE: I just got a step closer by using (see scales part):

p <- ggplot(x_output, aes(x=timestamp, y=avg_wait_time, group=city)) +
  geom_line(aes(color=city), size=1.5) + 
  theme(axis.text.x = element_text(angle = 90, hjust=1),
        legend.position = "bottom") + 
  labs(x=NULL, y="Waiting time (minutes)") + 
  facet_wrap( ~ weekday, ncol=5, **scales="free"**) +
  scale_x_datetime(breaks = date_breaks("1 hour"), 
                   labels=date_format("%H:%M"))

Now my output looks like:

enter image description here

However, as you can tell from the image, the axis is wrong and doesn't reflect the data. The data looks like (I've also put a copy here: http://pastebin.com/sDhzkHPC):

> r <- r[order(r$timestamp),]
> head(r,15)
       city  weekday     time avg_wait_time           timestamp
2714 Boston Thursday 10:04 AM          0.00 2001-01-06 10:04:00
2715 Boston Thursday 10:05 AM          0.00 2001-01-06 10:05:00
2716 Boston Thursday 10:06 AM          0.23 2001-01-06 10:06:00
2717 Boston Thursday 10:07 AM          0.23 2001-01-06 10:07:00
2718 Boston Thursday 10:08 AM          3.33 2001-01-06 10:08:00
2719 Boston Thursday 10:09 AM          3.33 2001-01-06 10:09:00
2720 Boston Thursday 10:10 AM          4.80 2001-01-06 10:10:00
2721 Boston Thursday 10:11 AM          4.80 2001-01-06 10:11:00
2722 Boston Thursday 10:12 AM          6.33 2001-01-06 10:12:00
2723 Boston Thursday 10:13 AM          6.33 2001-01-06 10:13:00
2724 Boston Thursday 10:14 AM          7.90 2001-01-06 10:14:00
2725 Boston Thursday 10:15 AM          7.90 2001-01-06 10:15:00
2726 Boston Thursday 10:16 AM          9.50 2001-01-06 10:16:00
2727 Boston Thursday 10:17 AM          9.50 2001-01-06 10:17:00
2728 Boston Thursday 10:18 AM         12.17 2001-01-06 10:18:00
> tail(r,15)
       city  weekday     time avg_wait_time           timestamp
2699 Boston Thursday 05:41 PM         22.07 2001-01-06 17:41:00
2700 Boston Thursday 05:42 PM         23.47 2001-01-06 17:42:00
2701 Boston Thursday 05:43 PM         23.47 2001-01-06 17:43:00
2702 Boston Thursday 05:44 PM         24.90 2001-01-06 17:44:00
2703 Boston Thursday 05:45 PM         24.90 2001-01-06 17:45:00
2704 Boston Thursday 05:46 PM         26.43 2001-01-06 17:46:00
2705 Boston Thursday 05:47 PM         19.33 2001-01-06 17:47:00
2706 Boston Thursday 05:49 PM         12.23 2001-01-06 17:49:00
2707 Boston Thursday 05:50 PM          0.00 2001-01-06 17:50:00
2708 Boston Thursday 05:52 PM          0.23 2001-01-06 17:52:00
2709 Boston Thursday 05:54 PM          1.80 2001-01-06 17:54:00
2710 Boston Thursday 05:55 PM          1.80 2001-01-06 17:55:00
2711 Boston Thursday 05:56 PM          0.00 2001-01-06 17:56:00
2712 Boston Thursday 05:57 PM          0.00 2001-01-06 17:57:00
2713 Boston Thursday 05:58 PM          0.00 2001-01-06 17:58:00

UPDATE 2: It was a timezone issue.

I solved it using this resource: plotting times with ggplot: added hour in plot maybe due to daylight saving?

More specifically, I did:

tz(x_output$timestamp) <- "GMT" # "America/New_York"

The comment was included just to trial test several different time zones to see which matched (they can be found here: http://www.inside-r.org/packages/cran/lubridate/docs/tz).

I will keep this question up for posterity.

Community
  • 1
  • 1
Alex Petralia
  • 1,730
  • 1
  • 22
  • 39

1 Answers1

0

My approach would be not to use the timestamp for the x axis but rather use just the time.

x_output$time <- strftime(timestamp, format="%H:%M")

Then setting up the ggplot using this call with the same weekday facet should work.

p <- ggplot(x_output, aes(x=time, y=avg_wait_time, group=city))

NB: Don't have access to an R environment right now so I cannot directly verify if this code works but in principle it should :)

CMichael
  • 1,856
  • 16
  • 20
  • This works but I think I should keep it as timestamps, as brought up in the answer here: http://stackoverflow.com/a/31443322/3247120. Is that possible? – Alex Petralia Jul 17 '15 at 16:56
  • I saw you fixed everything yourself - well done! However, I want to stress the fact that my solution is still time-coded so the issue described in your link will not emerge. – CMichael Jul 18 '15 at 06:16
  • NB: Given the large number of extra calls and packages I think that my answer is a valuable quick fix as well - will leave it here. – CMichael Jul 18 '15 at 06:17
  • I tried your method but since `time` is a string class, I can't easily set the amount of breaks on the x-axis (for example as compared to `scale_x_datetime(breaks=date_breaks("1 hour")`). I tried `scale_x_discrete(breaks = pretty_breaks(n=30))` instead but that doesn't seem to work either - I receive `In pretty.default(x, n, ...) : NAs introduced by coercion`. I'm not sure how `time` as you said is still "time-coded"? – Alex Petralia Jul 18 '15 at 21:46
  • 1
    I ended up going with your solution (mine was a bit finicky). I solved the `breaks` problem by manual entering the breaks and labels (eg. `brks <- c("09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00", "17:59") brks_labels <- c("9:00 am", "10:00 am", "11:00 am", "12:00 pm", "1:00 pm", "2:00 pm", "3:00 pm", "4:00 pm", "5:00 pm", "6:00pm")` – Alex Petralia Jul 18 '15 at 22:35