I currently have a ggplot2 chart which looks like:
The dataframe looks like:
city weekday time avg_wait_time timestamp
1511 Boston Tuesday 09:06 AM 0.20 2001-01-04 09:06:00
1512 Boston Tuesday 09:07 AM 0.20 2001-01-04 09:07:00
1513 Boston Tuesday 09:08 AM 0.20 2001-01-04 09:08:00
1514 Boston Tuesday 09:09 AM 3.17 2001-01-04 09:09:00
1515 Boston Tuesday 09:10 AM 3.17 2001-01-04 09:10:00
1516 Boston Tuesday 09:11 AM 3.17 2001-01-04 09:11:00
1517 Boston Tuesday 09:12 AM 3.17 2001-01-04 09:12:00
1518 Boston Tuesday 09:13 AM 5.87 2001-01-04 09:13:00
1519 Boston Tuesday 09:14 AM 5.87 2001-01-04 09:14:00
1520 Boston Tuesday 09:15 AM 5.87 2001-01-04 09:15:00
1521 Boston Tuesday 09:16 AM 5.87 2001-01-04 09:16:00
1522 Boston Tuesday 09:17 AM 9.17 2001-01-04 09:17:00
1523 Boston Tuesday 09:18 AM 9.17 2001-01-04 09:18:00
1524 Boston Tuesday 09:19 AM 12.20 2001-01-04 09:19:00
1525 Boston Tuesday 09:20 AM 12.20 2001-01-04 09:20:00
1526 Boston Tuesday 09:21 AM 12.10 2001-01-04 09:21:00
1527 Boston Tuesday 09:23 AM 13.70 2001-01-04 09:23:00
1528 Boston Tuesday 09:24 AM 13.70 2001-01-04 09:24:00
1529 Boston Tuesday 09:25 AM 15.30 2001-01-04 09:25:00
1530 Boston Tuesday 09:26 AM 15.30 2001-01-04 09:26:00
1531 Boston Tuesday 09:27 AM 16.90 2001-01-04 09:27:00
1532 Boston Tuesday 09:28 AM 16.90 2001-01-04 09:28:00
1533 Boston Tuesday 09:29 AM 18.33 2001-01-04 09:29:00
1534 Boston Tuesday 09:30 AM 18.33 2001-01-04 09:30:00
1535 Boston Tuesday 09:31 AM 16.90 2001-01-04 09:31:00
1536 Boston Tuesday 09:32 AM 16.90 2001-01-04 09:32:00
1537 Boston Tuesday 09:33 AM 18.57 2001-01-04 09:33:00
1538 Boston Tuesday 09:34 AM 18.57 2001-01-04 09:34:00
1539 Boston Tuesday 09:35 AM 21.73 2001-01-04 09:35:00
1540 Boston Tuesday 09:36 AM 21.73 2001-01-04 09:36:00
The classes for this dataframe are:
> sapply(x_output, class)
$city
[1] "factor"
$weekday
[1] "factor"
$time
[1] "character"
$avg_wait_time
[1] "numeric"
$timestamp
[1] "POSIXct" "POSIXt"
As you can see from the graphs, there is too much whitespace because ggplot2 is plotting a 24 hour day. In this dataset, the data is a one business-week time period (2001-01-01 to 2001-01-05)
(an "average" week with these specific dates arbitrarily set) and business hours are between 9am and 6pm
.
How can I make ggplot2 plot between 9am and 6pm exclusively?
Here's my attempt:
p <- ggplot(x_output, aes(x=timestamp, y=avg_wait_time, group=city)) +
geom_line(aes(color=city), size=1.5) +
theme(axis.text.x = element_text(angle = 90, hjust=1),
legend.position = "bottom") +
labs(x=NULL, y="Waiting time (minutes)") +
facet_wrap( ~ weekday, ncol=5) +
scale_x_datetime(breaks = date_breaks("1 hour")),
limits = c(as.POSIXct("9:00"), as.POSIXct("18:00"))
print(p)
I think if I can just configure the limits
part correctly, the ggplot2 will plot properly. Thanks for your help.
UPDATE: I just got a step closer by using (see scales
part):
p <- ggplot(x_output, aes(x=timestamp, y=avg_wait_time, group=city)) +
geom_line(aes(color=city), size=1.5) +
theme(axis.text.x = element_text(angle = 90, hjust=1),
legend.position = "bottom") +
labs(x=NULL, y="Waiting time (minutes)") +
facet_wrap( ~ weekday, ncol=5, **scales="free"**) +
scale_x_datetime(breaks = date_breaks("1 hour"),
labels=date_format("%H:%M"))
Now my output looks like:
However, as you can tell from the image, the axis is wrong and doesn't reflect the data. The data looks like (I've also put a copy here: http://pastebin.com/sDhzkHPC):
> r <- r[order(r$timestamp),]
> head(r,15)
city weekday time avg_wait_time timestamp
2714 Boston Thursday 10:04 AM 0.00 2001-01-06 10:04:00
2715 Boston Thursday 10:05 AM 0.00 2001-01-06 10:05:00
2716 Boston Thursday 10:06 AM 0.23 2001-01-06 10:06:00
2717 Boston Thursday 10:07 AM 0.23 2001-01-06 10:07:00
2718 Boston Thursday 10:08 AM 3.33 2001-01-06 10:08:00
2719 Boston Thursday 10:09 AM 3.33 2001-01-06 10:09:00
2720 Boston Thursday 10:10 AM 4.80 2001-01-06 10:10:00
2721 Boston Thursday 10:11 AM 4.80 2001-01-06 10:11:00
2722 Boston Thursday 10:12 AM 6.33 2001-01-06 10:12:00
2723 Boston Thursday 10:13 AM 6.33 2001-01-06 10:13:00
2724 Boston Thursday 10:14 AM 7.90 2001-01-06 10:14:00
2725 Boston Thursday 10:15 AM 7.90 2001-01-06 10:15:00
2726 Boston Thursday 10:16 AM 9.50 2001-01-06 10:16:00
2727 Boston Thursday 10:17 AM 9.50 2001-01-06 10:17:00
2728 Boston Thursday 10:18 AM 12.17 2001-01-06 10:18:00
> tail(r,15)
city weekday time avg_wait_time timestamp
2699 Boston Thursday 05:41 PM 22.07 2001-01-06 17:41:00
2700 Boston Thursday 05:42 PM 23.47 2001-01-06 17:42:00
2701 Boston Thursday 05:43 PM 23.47 2001-01-06 17:43:00
2702 Boston Thursday 05:44 PM 24.90 2001-01-06 17:44:00
2703 Boston Thursday 05:45 PM 24.90 2001-01-06 17:45:00
2704 Boston Thursday 05:46 PM 26.43 2001-01-06 17:46:00
2705 Boston Thursday 05:47 PM 19.33 2001-01-06 17:47:00
2706 Boston Thursday 05:49 PM 12.23 2001-01-06 17:49:00
2707 Boston Thursday 05:50 PM 0.00 2001-01-06 17:50:00
2708 Boston Thursday 05:52 PM 0.23 2001-01-06 17:52:00
2709 Boston Thursday 05:54 PM 1.80 2001-01-06 17:54:00
2710 Boston Thursday 05:55 PM 1.80 2001-01-06 17:55:00
2711 Boston Thursday 05:56 PM 0.00 2001-01-06 17:56:00
2712 Boston Thursday 05:57 PM 0.00 2001-01-06 17:57:00
2713 Boston Thursday 05:58 PM 0.00 2001-01-06 17:58:00
UPDATE 2: It was a timezone issue.
I solved it using this resource: plotting times with ggplot: added hour in plot maybe due to daylight saving?
More specifically, I did:
tz(x_output$timestamp) <- "GMT" # "America/New_York"
The comment was included just to trial test several different time zones to see which matched (they can be found here: http://www.inside-r.org/packages/cran/lubridate/docs/tz).
I will keep this question up for posterity.