Dear stack overflow community,
I am new to the world of statistical programming using R. I have been given the task of creating a simple autoregressive model with which one could forecast, or should I say, nowcast the unemployment rate in a country using only data from Google Trends. To create the model, I have been given a .csv file containing the unemployment rates between 2011 and 2015 (5 years) and a .csv file containing the Google Trends values for the topic "Unemployment" (2011-2015).
As one can imagine, I have imported both files into RStudio and converted them into time series (60 months). Here is an overview:
Unemployment Rates vs Google Trends
I would now need help creating that AR model. Please keep in mind that this model should remain as simple as possible and it is not intended to be perfect. Here are my questions:
- Should I use decomposed time series, even though the values of the decomposed time series are not that convincing (p-values are still high).
- What would be the simplest way to create an autoregressive model using R and the two time series (unemployment, google). This model should then be used to nowcast the actual unemployment rate using the actual Google Trends value.
Since I am not very experienced with R, I am getting a bit lost. Help would be greatly appreciated!
Thanks a lot!
Here is the data (samples are provided in the code below)
Here is my code so far:
# Import required libraries
library(lubridate)
library(tseries)
library(xts)
library(forecast)
library(readr)
# # # # # # # # # # # Unemployment Rate # # # # # # # # # # #
unemploymentRate <- read_csv("~/Desktop/UnemploymentRates_2011-2015.csv")
# Unemployment sample: structure(list(`Month` = 1:10, Year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L), UnemploymentRate = c(7.9, 7.9, 7.6, 7.3, 7, 6.9, 7, 7, 6.6, 6.5)), .Names = c("Month", "Year", "UnemploymentRate"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
# Create monthly time series for unemployment rates
tsUnemployment <- ts(unemploymentRate$UnemploymentRate, start = c(2011,1), frequency = 12)
# # # # # # # # # # # Google Trends Topic # # # # # # # # # # #
google <- read_csv("~/Desktop/google.csv", col_types = cols(Woche = col_date(format="%Y-%m-%d")))
colnames(google)[2] <- "googleTrend"
#Google sample: structure(list(Week = structure(c(14976, 14983, 14990, 14997, 15004, 15011, 15018, 15025, 15032, 15039), class = "Date"), Unemployment = c(88L, 89L, 100L, 91L, 88L, 88L, 87L, 91L, 89L, 78L)), .Names = c("Week", "Unemployment"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
# Extract month and year from date
google$Month <- month(google$Week, abbr = FALSE)
google$Year <- year(google$Week)
# Aggregate weeks into months using the mean
aggGoogle <- aggregate(google$googleTrends ~ Month + Year , google, mean)
colnames(aggGoogle)[3] <- "aggGoogleTrends"
# Create monthly time series for the Google Trends
tsGoogle <- ts(aggGoogle$aggGoogleTrends, start = c(2011,1), frequency = 12)
# # # # # # # # # # # Decomposition + Analysis # # # # # # # # # # #
decompose_Unemployment <- decompose(tsUnemployment, "additive")
decompose_Google <- decompose(tsGoogle, "additive")
finalUnemployment <- decompose_Unemployment$seasonal + decompose_Unemployment$trend + decompose_Unemployment$random
finalGoogle <- decompose_Google$seasonal + decompose_Google$trend + decompose_Google$random
Now, I am ready to perform the statistical tests:
adf.test(tsUnemployment, alternative = "stationary")
Box.test(tsUnemployment, type = "Ljung-Box")
Box.test(finalUnemployment, type = "Ljung-Box")
adf.test(tsGoogle, alternative = "stationary")
Box.test(tsGoogle, type = "Ljung-Box")
Box.test(finalGoogle, type = "Ljung-Box")