11

Say we have the following simple data-frame of date-value pairs, where some dates are missing in the sequence (i.e. Jan 12 thru Jan 14). When I plot the points, it shows these missing dates on the x-axis, but there are no points corresponding to those dates. I want to prevent these missing dates from showing up in the x-axis, so that the point sequence has no breaks. Any suggestions on how to do this? Thanks!

dts <- c(as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16')))
df <- data.frame(dt = dts, val = seq_along(dts)) 
ggplot(df, aes(dt,val)) + geom_point() + 
        scale_x_date(format = '%d%b', major='days')

enter image description here

Prasad Chalasani
  • 19,912
  • 7
  • 51
  • 73

3 Answers3

12

I made a package that does this. It's called bdscale and it's on CRAN and github. Shameless plug.

To replicate your example:

> library(bdscale)
> library(ggplot2)
> library(scales)
> dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
> ggplot(df, aes(x=dt, y=val)) + geom_point() + 
    scale_x_bd(business.dates=dts, labels=date_format('%d%b'))

replicate example

But what you probably want is to load known valid dates, then plot your data using the valid dates on the x-axis:

> nyse <- bdscale::yahoo('SPY') # get valid dates from SPY prices
> dts <- as.Date('2011-01-10') + 1:10
> df <- data.frame(dt=dts, val=seq_along(dts))
> ggplot(df, aes(x=dt, y=val)) + geom_point() + 
    scale_x_bd(business.dates=nyse, labels=date_format('%d%b'), max.major.breaks=10)

Warning message:
Removed 3 rows containing missing values (geom_point). 

better

The warning is telling you that it removed three dates:

  • 15th = Saturday
  • 16th = Sunday
  • 17th = MLK Day
dvmlls
  • 2,206
  • 2
  • 20
  • 34
11

Turn the date data into a factor then. At the moment, ggplot is interpreting the data in the sense you have told it the data are in - a continuous date scale. You don't want that scale, you want a categorical scale:

require(ggplot2)
dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
df <- data.frame(dt = dts, val = seq_along(dts)) 
ggplot(df, aes(dt,val)) + geom_point() + 
        scale_x_date(format = '%d%b', major='days')

versus

df <- data.frame(dt = factor(format(dts, format = '%d%b')), 
                  val = seq_along(dts)) 
ggplot(df, aes(dt,val)) + geom_point()

which produces: enter image description here

Is that what you wanted?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • @Gavin, thanks... but that changes the dates being displayed: I had 10Jan thru 16Jan, now we get 2Jan thru 5Jan. Any way to fix that? I guess I could go with treating the dates as strings, and completely lose date semantics, but is there a way where I don't lose date semantics? – Prasad Chalasani Mar 02 '11 at 15:14
  • But those are consecutive dates, and not the one ones from the original question. – Dirk Eddelbuettel Mar 02 '11 at 15:14
  • I noticed that, forgot to format the dates as per the original scale. My fault - see the Answer now. – Gavin Simpson Mar 02 '11 at 15:22
  • @Joris Thanks - why did you delete your Answer? It contained some good points not in my answer re the ordering of certain date formats etc. I was about to upvote it and it had disappeared whilst I was answering another comment – Gavin Simpson Mar 02 '11 at 15:30
  • 2
    Ordering is a good point. You may want `ordered()` rather than `factor()` (or the appropriate `ordered=TRUE` flag) once you hit different months, single-vs-dual digit days, ... – Dirk Eddelbuettel Mar 02 '11 at 15:37
  • @Gavin @Prasad @Dirk : I deleted my answer to correct it, as the ordered still didn't do what I needed. It's there again, this time in the correct way. just using ordered() as I did didn't help, as that function still uses the alphabetical order. You have to set the levels explicitly... – Joris Meys Mar 02 '11 at 15:44
5

First question is : why do you want to do that? There is no point in showing a coordinate-based plot if your axes are not coordinates. If you really want to do this, you can convert to a factor. Be careful for the order though :

dts <- c(as.Date( c('31-10-2011', '01-11-2011', '02-11-2011',
           '05-11-2011'),format="%d-%m-%Y"))
dtsf <- format(dts, format= '%d%b')
df <- data.frame(dt=ordered(dtsf,levels=dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()

enter image description here

With factors you have to be careful, as the order is arbitrary in a factor,unless you make it an ordered factor. As factors are ordered alphabetically by default, you can get in trouble with some date formats. So be careful what you do. If you don't take the order into account, you get :

df <- data.frame(dt=factor(dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()

enter image description here

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • It's pretty common in financial time series to have no data on weekends, so it's visually not nice to have breaks in plots where there are weekends. – Prasad Chalasani Mar 02 '11 at 15:18
  • @Prasad : I see. I'd rather add something like "working days" then on the X axis, as now you give the impression of a continuous function that is in fact not continuous on the X axis. Sounds like nitpicking, but it can be pretty confusing. – Joris Meys Mar 02 '11 at 15:23
  • @JorisMeys What if there are duplicate time? – user5779223 Feb 25 '16 at 13:49