1

Dear stack overflow community,

I am new to the world of statistical programming using R. I have been given the task of creating a simple autoregressive model with which one could forecast, or should I say, nowcast the unemployment rate in a country using only data from Google Trends. To create the model, I have been given a .csv file containing the unemployment rates between 2011 and 2015 (5 years) and a .csv file containing the Google Trends values for the topic "Unemployment" (2011-2015).

As one can imagine, I have imported both files into RStudio and converted them into time series (60 months). Here is an overview:

Unemployment Rates vs Google Trends

I would now need help creating that AR model. Please keep in mind that this model should remain as simple as possible and it is not intended to be perfect. Here are my questions:

  • Should I use decomposed time series, even though the values of the decomposed time series are not that convincing (p-values are still high).
  • What would be the simplest way to create an autoregressive model using R and the two time series (unemployment, google). This model should then be used to nowcast the actual unemployment rate using the actual Google Trends value.

Since I am not very experienced with R, I am getting a bit lost. Help would be greatly appreciated!

Thanks a lot!

Here is the data (samples are provided in the code below)
Here is my code so far:

# Import required libraries
library(lubridate)
library(tseries)
library(xts)
library(forecast)
library(readr)

# # # # # # # # # # # Unemployment Rate # # # # # # # # # # #

unemploymentRate <- read_csv("~/Desktop/UnemploymentRates_2011-2015.csv")

# Unemployment sample: structure(list(`Month` = 1:10, Year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L), UnemploymentRate = c(7.9, 7.9, 7.6, 7.3, 7, 6.9, 7, 7, 6.6, 6.5)), .Names = c("Month", "Year", "UnemploymentRate"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

# Create monthly time series for unemployment rates
tsUnemployment <- ts(unemploymentRate$UnemploymentRate, start = c(2011,1), frequency = 12)

# # # # # # # # # # # Google Trends Topic # # # # # # # # # # #


google <- read_csv("~/Desktop/google.csv", col_types = cols(Woche = col_date(format="%Y-%m-%d")))
colnames(google)[2] <- "googleTrend"

#Google sample: structure(list(Week = structure(c(14976, 14983, 14990, 14997, 15004, 15011, 15018, 15025, 15032, 15039), class = "Date"), Unemployment = c(88L, 89L, 100L, 91L, 88L, 88L, 87L, 91L, 89L, 78L)), .Names = c("Week", "Unemployment"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

# Extract month and year from date
google$Month <- month(google$Week, abbr = FALSE)
google$Year <- year(google$Week)

# Aggregate weeks into months using the mean
aggGoogle <- aggregate(google$googleTrends ~ Month + Year , google, mean)
colnames(aggGoogle)[3] <- "aggGoogleTrends"

# Create monthly time series for the Google Trends
tsGoogle <- ts(aggGoogle$aggGoogleTrends, start = c(2011,1), frequency = 12)

# # # # # # # # # # # Decomposition + Analysis # # # # # # # # # # #

decompose_Unemployment <- decompose(tsUnemployment, "additive")
decompose_Google <- decompose(tsGoogle, "additive")

finalUnemployment <- decompose_Unemployment$seasonal + decompose_Unemployment$trend + decompose_Unemployment$random
finalGoogle <- decompose_Google$seasonal + decompose_Google$trend + decompose_Google$random

Now, I am ready to perform the statistical tests:

adf.test(tsUnemployment, alternative = "stationary")
Box.test(tsUnemployment, type = "Ljung-Box")
Box.test(finalUnemployment, type = "Ljung-Box")

adf.test(tsGoogle, alternative = "stationary")
Box.test(tsGoogle, type = "Ljung-Box")
Box.test(finalGoogle, type = "Ljung-Box")
David C.
  • 1,974
  • 2
  • 19
  • 29
m_aTT
  • 11
  • 2
  • 2
    This question sounds like a better fit for [crossvalidated](http://stats.stackexchange.com/), which is a Q&A site for statistical advice. – eipi10 Feb 04 '17 at 20:09
  • Welcome to Stackoverflow! It'll be useful if you can provide an executable example... – David C. Feb 04 '17 at 20:10
  • @DavidC. Hi David, thanks a lot for the quick response. As you realised, I am new to Stackoverflow and I'm not entirely sure what you meant by "provide an executable"? Would you like me to upload my R script? So far I have only converted the data into time series, created a few plots, and tried to decompose the said time series. – m_aTT Feb 04 '17 at 20:21
  • You can paste your code as part of your post. There is something like `{ }` that will format your code in a reader-friendly format. This way, others can possibly run your code and/or check potential problems easily. – David C. Feb 04 '17 at 20:23
  • @eipi10 Hi eipi, thanks a lot for the link. There is quite some good stuff in there! I hope I'll be able to find some answers. – m_aTT Feb 04 '17 at 20:24
  • You might also be interested in the free online textbook [Forecasting: Principles and Practice](https://www.otexts.org/fpp), by Rob Hyndman, who is also the author of the `R` [forecast package](https://cran.r-project.org/web/packages/forecast/index.html). – eipi10 Feb 04 '17 at 20:30
  • By "Executable Example" I believe @DavidC. means a [Reproducible Example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). That basically means the minimal code and data necessary to understand and answer your question. – eipi10 Feb 04 '17 at 20:32
  • @eipi10 Thanks again. That could come in really handy! I've uploaded the code but not entirely sure how I can upload the data. Sorry for all the stupid questions. As I said, I am fairly new to all this... – m_aTT Feb 04 '17 at 20:44
  • @DavidC. I've just uploaded the code and posted a link to the data sets. Hope this can help you better understand the problem. – m_aTT Feb 04 '17 at 20:53
  • We don't need all the data posted, just the minimal amount necessary to understand and answer your question--in this case, maybe 10 or 20 corresponding rows from each data file. For example, paste into your question the output of `dput(google[1:10,])` to provide the first 10 rows of the google data. – eipi10 Feb 04 '17 at 21:01
  • I've looked into the AR(1) model and the ARIMA model, which I think could do the job, but I am not quite sure how to implement either of them... – m_aTT Feb 04 '17 at 21:03

1 Answers1

0

(Like @eipi10 commented, this is more of a question for Cross Validated, Data Science, or Mathematics, especially you don't seem to have issue with code and stat tests. If the answers you get here doesn't help, you should consider ask in those places)

Suggestion for Question 1: This question is particularly difficult to answer, as it is so dependent on your data. Based on this page, if you decide to use AR, then applying a decomposition model is an appropriate things to do. However, this does not mean that decomposition is your only option.

Suggestion for Question 2: To implement autoregressive (AR) models in R, the simplest approach is from stats package. The function stats::ar should work for you, provided that you have a time series dataset. If your data is of data.frame but not time series (ts), you can use the function stats::ts to convert.

Community
  • 1
  • 1
David C.
  • 1,974
  • 2
  • 19
  • 29