1

I'm working on a project for my Data Science class. My question for the project is "Are Americans financial satisfaction dependent/affected by the annual return of the S&P500 in the year prior?" This is an observational study. I have broken down information from other datasets so I now have 56,000 cases, with the variables for year and financial satisfaction. I also have the annualized returns of the S&P500 in percent terms from 1971 through 2013.

I have to now take the annual return of 1971 and apply it to ALL variables under the year 1972 in a new column of the dataset called (spReturns).So essentially the returns will always be year-1. I'm new to R and have no idea how to do this so I was hoping I could get some help. My code is below in case you need to replicate it.

install.packages("lubridate")
install.packages("zoo")
install.packages("xts")
install.packages("Quandl")

require(Quandl)
require(lubridate)
require(zoo)
require(xts)

myData <- load(url("http://bit.ly/dasi_gss_data"))
myData <- myData

year <- gss$year
finSat <- gss$satfin

relativeTable <- data.frame(year, finSat)
relativeTable <- subset(relativeTable, year > "1988")


spReturns <- Quandl("SANDP/ANNRETS", trim_start="1970-01-11", 
                    trim_end="2012-12-31", authcode="nwy3a_Gmd7TSS9fVirxT", 
                    collapse="annual")

percentChange <- spReturns$"Total Return Change"

spReturns$"Year Ending" <- format((spReturns$"Year Ending"), "%Y")
spReturns$"Year Ending" <- as.numeric(spReturns$"Year Ending")
spReturns$"Year Ending" <- spReturns[,1] + 1 #the following year
Jason T. Eyerly
  • 183
  • 1
  • 6
  • 18
  • 1
    Since you said this is for homework, I'll leave the execution up to you, but here are some thoughts. 1) I would extract both the year and the `Total Return Change` columns from the Quandl data. 2) I would think about the arithmetic you outlined above on how you may want to adjust the year data, and 3) I would look at the `merge` column to join your data.frames together based on year, or lagged year, or whatever is appropriate for your use case. – Chase Sep 16 '14 at 03:45
  • 1
    also, this may be helpful for extracting information about the year: http://stackoverflow.com/questions/9749598/r-obtaining-month-and-year-from-a-date – Chase Sep 16 '14 at 03:46
  • It's for a coursera class, that counts as homework, right? We have learned a lot about the statistics side of things, but not R programming so I know very little. This was suggested: spReturns$lagYear <- format(index(spReturns), "%Y") but returns: Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : invalid 'trim' argument – Jason T. Eyerly Sep 16 '14 at 04:04
  • `Quandl` is pretty clearly from a non-base package which you have not mentioned. We need in addition the results of `class((index(spReturns))` at a minimum and even better `dput( head( index( spReturns)))` – IRTFM Sep 16 '14 at 04:27
  • I've update the code to what I have figured out at this point, and the packages required. – Jason T. Eyerly Sep 16 '14 at 04:32
  • Based on what I have now, I just need to figure out how to add a new column, and attach the spReturns$"Total Return Change" to relativeTable, based on the correct alignment of years. Any suggestions on how to go about this? – Jason T. Eyerly Sep 16 '14 at 04:48
  • This sounds like a simple application of `match` to some data frames but obfuscated with a whole load of unnecessary code in the example. Try and make a really simple example (with maybe a dozen items) that doesn't rely on a 2megabyte download. – Spacedman Sep 16 '14 at 07:11

1 Answers1

0

After adding +1 to each year so that they would match numerically with the proper variable, I tacked on the code below. The merge() function in R, creates a new dataset using the two parameters given, and organizes them "by.x" and "by.y". As you can see in this situation, x = year, and y = year ending. The second line of code then creates one more dataset, only using the variable columns that are important for my purposes.

combined <- merge(relativeTable, spReturns, by.x = "year", by.y = "Year Ending")
finalResults <- data.frame(combined$year, combined$finSat, combined$percentChange)
Jason T. Eyerly
  • 183
  • 1
  • 6
  • 18