0

I am using the forecast package of R and I would like to display a time series. When I do this, the x-ticks begin at 1 and are always incremented by 1. How can I define the x ticks as dates? So I have two options:

  1. Read the time stamp data from a file
  2. Manually assign the time stamps to the time series

Basically I would like to know how I can implement both options. So here is my code so far.

autoplot(ts(generationData$Demand))

generationData is a dataframe that also contains timestamps for different time series but it has a bad format for displaying ("2019-01-01 01:00:00+01:00", "2019-01-01 02:00:00+01:00" etc.). So I think that the 2 option is better. How can I define the ticks for example to be months in the year 2019 (January, February etc.)?

I'd appreciate every comment.

PeterBe
  • 700
  • 1
  • 17
  • 37
  • Maybe there is answers to your question in https://stackoverflow.com/questions/4843969/plotting-time-series-with-date-labels-on-x-axis – Ricardo Semião e Castro Oct 20 '20 at 14:04
  • Thanks for the comment Ricardo. As I am new to R I do not understand the answers given in those posts and I do not know how I can adjust them to my case. Is there no direct way to do this in the autoplot command? – PeterBe Oct 20 '20 at 14:18
  • 1
    Okay, can you post a few rows of your data? try copy and pasting the output from `dput(generationData[1:20,])`. – Ricardo Semião e Castro Oct 20 '20 at 14:27
  • Also, you want the date in what format? The x axis should be by month or other breaks? @PeterBe – Ricardo Semião e Castro Oct 20 '20 at 14:41
  • Thanks Ricardo for your answer. As the generationData has 20 rows and 8700 columns the command dput(generationData[1:20,]) created a lot of entries that I can't post. See here an extract of it – PeterBe Oct 20 '20 at 14:50
  • "2019-09-03 15:00:00+02:00", "2019-09-03 16:00:00+02:00", "2019-09-03 17:00:00+02:00", "2019-09-03 18:00:00+02:00", "2019-09-03 19:00:00+02:00", "2019-09-03 20:00:00+02:00", "2019-09-03 21:00:00+02:00", "2019-09-03 22:00:00+02:00", "2019-09-03 23:00:00+02:00", "2019-09-04 00:00:00+02:00", "2019-09-04 01:00:00+02:00", "2019-09-04 02:00:00+02:00", "2019-09-04 03:00:00+02:00", "2019-09-04 04:00:00+02:00", "2019-09-04 05:00:00+02:00", "2019-09-04 06:00:00+02:00", "2019-09-04 07:00:00+02:00", "2019-09-04 08:00:00+02:00", "2019-09-04 09:00:00+02:00", "2019-09-04 10:00:00+02:00" – PeterBe Oct 20 '20 at 14:51
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/223364/discussion-between-ricardo-semiao-e-castro-and-peterbe). – Ricardo Semião e Castro Oct 20 '20 at 15:03

2 Answers2

1

As you want the breaks to be monthly, you don't need the hour part, so you can specify that you want your date in the following format:

format = "%Y-%m-%d" #which means "year, then a "-", then month, then "-", then day"
generationData$Date = as.Date(generationData$Date, format)

Assuming the date column is called "Date". Then we create the ggplot (again assuming that the column with the value of the time series is called "Value"):

ggplot(generationData, aes(x=Date, y=Value)) +
#We want the x axis to be Date and y to be the value of the ts
  geom_line() + #Creates a line graph
  scale_x_date(date_breaks="1 month", #Sets the breaks to be monthly
               date_labels="%m") #Sets that every break, the tick should contain only the month value

You can easily change the breaks to "3 weeks" for example, and the labels to "%m/%d" for example, to get a mm/dd format.

But this approach gives the number of the months, which isn't too pretty, to get the names of the months you can use months function to create a new column:

generationData$Date2 = months(generationData$Date, abbreviate=TRUE)

And then just change the labels on scale_x_date to this new column:

scale_x_date(date_breaks="1 month", #Sets the breaks to be monthly
               date_labels=Date2)

There is probably a simpler way using only the autoplot function, so i encourage you to try understanding answers that only use that. I hope i didn't made it too complicated :).

  • Thanks a lot Ricardo for your answer. Can you think of a simpler way of doing this (maybe using autoplot)? – PeterBe Oct 21 '20 at 07:39
1

Quick disclaymer: In the end i think ggplot is easier. I'm going to try explaining in a way you can generalize it, which can make it seem complicated, but it's not that hard. Also, i'm no genius at autoplot, so maybe there is an easier way that i don't know of. Lastly, i use "y" for the time series column, and "date" for the dates.

Reading your date as an date object is nice even for the 'simpler' autoplot approach, and it's not hard:

format = "%Y-%m-%d %H:%M:S"
df$date = as.POSIXct(df$date, format, tz="your time zone code here")

Values for the limits

d = which(df$date=="2019-10-01 00:00:00") #First date you want
e = which(df$date=="2019-12-01 00:00:00") #Last date you want

Values for the x axis breaks. Now you want to apply your breaks only to the data in the limits, so remember that when setting a and n.

a = 1 #The date you wish to start the ticks. If you wanted to be the 1st of oct, for example:
a = which(df$date=="2019-10-01 00:00:00")
n = 12 #The number of months there will be in the ticks
k = 720 #The conversion factor, in this case is months-->hours

autoplot(df$y) + xlim(d, e) +
  scale_x_continuous(breaks=seq(a,n*k,k),
                     labels=months(df$date,TRUE))
#Set FALSE to not use abbreviations, Set labels=1:n to use numbers

As you can see, because you don't pass df$date as an argument to autoplot, you have to "chew" the info about breaks and limits to it, which you don't need in ggplot. You don't need to understand every option you have with ggplot if you don't want for now, you can try to memorize this structure:

ggplot(df,aes(x=date,y=y)) + #Pass the data frame, then the x and y column names inside "aes()"
  geom_line() + #For time series, you'll probably always want a line graph
  scale_x_date(breaks="1 month", labels="%m", #Set labels=month(date,TRUE/FALSE) to get month names
               limits=as.POSIXct(c("first date", "last date"), format))

Instead defining a,b,c and creating a sequence to breaks, we just say "1 month", and instead looking for where the limit dates are in our df, we just say when are they. It's also easier to change the scale, if you want to do weekly, just change breaks to "1 week" and labels to %W, whereas with autoplot you need to recalculate a,n,k. Sorry if it seemed complicated again.