-1

I am still somewhat new to R programming although I have figured out how to get some of my required forecasting models to work. The one problem that I have is that I can't get the Date column from my dataset to print on the x-axis of my final ARIMA ggplot rendering. It just shows an index going in intervals from 0 to 200 to 400 to 600. Everything else works except for the date display. Here is my full code:

sentiments = read.csv('trollscores.csv', stringsAsFactors = FALSE)
sentiments$Date <- as.Date(sentiments$Date, format = "%m/%d/%Y")

library(ggplot2)
library(forecast)
library(tseries)
library(timetk)

ggplot(sentiments, aes(Date, sentiments$SentimentLM)) + geom_line() + 
scale_x_date('month')  + ylab("Sentiment Scores") +
xlab("")

count_ts = ts(sentiments[,c('SentimentLM')])
sentiments$clean_cnt = tsclean(count_ts)

ggplot() +
geom_line(data = sentiments, aes(x = Date, y = clean_cnt)) +
ylab('Cleaned Bicycle Count')

sentiments$cnt_ma = ma(sentiments$clean_cnt, order=7) # using the clean 
count with no outliers
sentiments$cnt_ma30 = ma(sentiments$clean_cnt, order=30)

ggplot() +
geom_line(data = sentiments, aes(x = Date, y = clean_cnt, colour = 
"Counts")) 
+
geom_line(data = sentiments, aes(x = Date, y = cnt_ma,   colour = "Weekly 
Moving Average"))  +
geom_line(data = sentiments, aes(x = Date, y = cnt_ma30, colour = "Monthly 
Moving Average"))  +
ylab('Sentiment Score')

count_ma = ts(na.omit(sentiments$cnt_ma), frequency=30)
decomp = stl(count_ma, s.window="periodic")
deseasonal_cnt <- seasadj(decomp)
plot(decomp)

adf.test(count_ma, alternative = "stationary")

Acf(count_ma, main='')

Pacf(count_ma, main='')

count_d1 = diff(deseasonal_cnt, differences = 1)
plot(count_d1,col="blue")
adf.test(count_d1, alternative = "stationary")

Acf(count_d1, main='ACF for Differenced Series')
Pacf(count_d1, main='PACF for Differenced Series')

auto.arima(deseasonal_cnt, seasonal=FALSE)

fit<-auto.arima(deseasonal_cnt, seasonal=FALSE)
tsdisplay(residuals(fit), lag.max=45, main='(0,1,2) Model Residuals')

fit2 = arima(deseasonal_cnt, order=c(1,1,7))
tsdisplay(residuals(fit2), lag.max=15, main='Seasonal Model Residuals')

fcast <- forecast(fit2, h=30)
plot(fcast)

hold <- window(ts(deseasonal_cnt), start=700)

fit_no_holdout = arima(ts(deseasonal_cnt[-c(700:725)]), order=c(1,1,7))

fcast_no_holdout <- forecast(fit_no_holdout,h=25)
plot(fcast_no_holdout, main=" ")
lines(ts(deseasonal_cnt))

library(sweep)
library(tidyquant)

ne_sweep <- sw_sweep(fcast,timetk_idx = TRUE,rename_index = "date")

# Visualizing the forecast
ne_sweep %>%
ggplot(aes(x = date, y = value, color = key)) +
# Prediction intervals
geom_ribbon(aes(ymin = lo.95, ymax = hi.95), 
          fill = "#D5DBFF", color = NA, size = 0) +
geom_ribbon(aes(ymin = lo.80, ymax = hi.80, fill = key), 
          fill = "#596DD5", color = NA, size = 0, alpha = 0.8) +
# Actual & Forecast
geom_line(size = 1) + 
geom_point(size = 2) +
# Aesthetics
theme_tq(base_size = 16) +
scale_color_tq() +
labs(title = "Sentiment 3-Year Forecast", x = "", y = "Level of sentiment") 

dput(head(ne_sweep))
structure(list(date = c(1, 1.03333333333333, 1.06666666666667, 
1.1, 1.13333333333333, 1.16666666666667), key = c("actual", "actual", 
"actual", "actual", "actual", "actual"), value = c(-0.00117229792495284, 
-0.00204034959504821, -0.00293998225125085, -0.00263003238897274, 
-0.00176165038488553, -0.00190213131023263), lo.80 = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), lo.95 = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), hi.80 = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), hi.95 = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

enter image description here

  • 2
    Welcome to StackOverflow! For code debugging please ask with a [reproducible](https://stackoverflow.com/q/5963269/1422451) example per the [MCVE](https://stackoverflow.com/help/mcve) and [`r`](https://stackoverflow.com/tags/r/info) tag description, with the desired output. You can use `dput()`, `reprex::reprex()` or built-in data sets for reproducible data. – Hack-R Oct 01 '18 at 18:54
  • 2
    ...for example, you shared quite a lot of code. I can't immediately tell which of your `ggplot` calls is the one you're asking about. – joran Oct 01 '18 at 18:56
  • Joran, to answer your question as to which ggplot I was referring to, it was the very last one in the code that I posted. It is the ggplot that starts with: "# Visualizing the forecast ne_sweep %>% ggplot(aes(x = date, y = value, color = key)) + # Prediction intervals" – Jonathan Adkins Oct 01 '18 at 21:35

1 Answers1

0

I suspect you need to add a scale term to tell ggplot how to translate the index column called date onto the x axis as a date:

scale_x_yearmon(n = 12, format = "%Y %m")

EDIT: Alternative solution after sample data was added to question.

It looks like the index column starts with 1.0 and then 1/30th is added for each subsequent row. I assume these encode days, and convert them to day_num, which can then be added to start_date to get a ggplot-friendly date in the new date2 column.

library(dplyr)
start_date <- as.Date("2015-01-01")  # Replace with first day of data range
ne_sweep_dates <- ne_sweep %>%
  as_tibble() %>%
  mutate(day_num = (date-1)*30) %>%
  mutate(date2 = start_date + day_num)

#library(ggplot2); library(tidyquant)
ne_sweep_dates %>%
  ggplot(aes(x = date2, y = value, color = key)) +
  # Prediction intervals
  geom_ribbon(aes(ymin = lo.95, ymax = hi.95), 
              fill = "#D5DBFF", color = NA, size = 0) +
  geom_ribbon(aes(ymin = lo.80, ymax = hi.80, fill = key), 
              fill = "#596DD5", color = NA, size = 0, alpha = 0.8) +
  # Actual & Forecast
  geom_line(size = 1) + 
  geom_point(size = 2) +
  # Aesthetics
  theme_tq(base_size = 16) +
  scale_color_tq() +
  labs(title = "Sentiment 3-Year Forecast", x = "", y = "Level of sentiment") 

enter image description here

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Jon Spring, I added the `scale_x_yearmon(n = 12, format = "%Y %m")` statement to my ggplot and I got a different set of numbers on the x axis. The index now starts as: 0000 01, 0050 01, 0100 01, 0200 01, 0250 01...and so forth. – Jonathan Adkins Oct 01 '18 at 22:18
  • Can you share a sample of what ne_sweep looks like? e.g. What's the output from `dput(head(ne_sweep))`? – Jon Spring Oct 01 '18 at 22:22
  • Jon Spring, I could not provide the full output. Too many characters. Here is a sample `dput(head(ne_sweep)): structure(list(date = c(1, 1.03333333333333, 1.06666666666667, 1.1, 1.13333333333333, 1.16666666666667), key = c("actual", "actual", "actual", "actual", "actual", "actual"), value = c(-0.00117229792495284, -0.00204034959504821, -0.00293998225125085, -0.00263003238897274, -0.00176165038488553, -0.00190213131023263), lo.80 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), lo.95 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), hi.80 = c(NA_rea` – Jonathan Adkins Oct 01 '18 at 22:55
  • Could you please include in the question above? – Jon Spring Oct 01 '18 at 23:33
  • Jon Spring, I added the `dput` output to the original question as requested. – Jonathan Adkins Oct 02 '18 at 00:00
  • Jon Spring, I just added the dplyr package and the code that you posted above in this thread. How am I supposed to adjust the ggplot code chunk to take advantage of this new addition? The final plot still looks the same. – Jonathan Adkins Oct 02 '18 at 00:58
  • I added the `na_sweep_dates` variable to the begging of the ggplot statement in place of the original `na_sweep` variable. I also changed the `x=date2` statement in the aesthetic part of the code chunk. When I ran the whole thing I got this error: `Error in charToDate(x) : character string is not in a standard unambiguous format.` Do I need to change the scale statement in the ggplot code chunk? – Jonathan Adkins Oct 02 '18 at 01:14