1

My data frame is SPData, which includes date, open price, closing price and volume for the S&P500 for the last 25 years (~ 250 days per year). Additionally, I have SPData$Year, which is simply a vector with the year from the date column stored numerically, from 1990 - 2015.

library(dplyr)
SPData1990 <- filter(SPData, Year == 1990)

results in a data frame with ~250 observations, one for each trading day in 1990. I did this for all 25 years already.

Is there a way to create a formula that would save all the other data corresponding from each year as a new data frame (SPData1991, SPData 1992, SPData1993, etc.)? I was trying to think through a for(i in years) loop corresponding to the formula, years <- unique(SPData$Year, FALSE), but I am not familiar enough with programming in general to figure this out.

Thanks

Rorschach
  • 31,301
  • 5
  • 78
  • 129
chris dorn
  • 817
  • 8
  • 13

2 Answers2

5

With thanks to @user20650...

# reproducible example!
set.seed(123)
year_range = 1990:2014
SPData <- data.frame(Year=sample(year_range,1000,replace=TRUE),
                     Sales=runif(1000,min=100,max=200) )

# split the list into data frames on "SPDataYYYY" and store in global environment
list2env(split(SPData, paste0("SPData",SPData$Year)),
         envir = .GlobalEnv)
ls()
# [1] "SPData"     "SPData1990" "SPData1991" "SPData1992" "SPData1993" "SPData1994"
# [7] "SPData1995" "SPData1996" "SPData1997" "SPData1998" "SPData1999" "SPData2000"
# [13] "SPData2001" "SPData2002" "SPData2003" "SPData2004" "SPData2005" "SPData2006"
# [19] "SPData2007" "SPData2008" "SPData2009" "SPData2010" "SPData2011" "SPData2012"
# [25] "SPData2013" "SPData2014" "year_range"
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • In my defense, SO threw an extremely slow-loading CAPTCHA at me after I clicked post... – C8H10N4O2 Jul 02 '15 at 02:23
  • I didn't downvote. But, suggesting the use of `assign` is a bad practice in r: https://twitter.com/hadleywickham/status/535931179556691968 :. There are so many answers on SO, that repeatedly warns not to use `assign`. – user227710 Jul 02 '15 at 02:37
  • I agree with that view, plus it may not be a good idea to create multiple data.frame objects, but: that's what the poster asked. To do what was asked, `assign` is exactly what is needed. – Ken Benoit Jul 02 '15 at 02:42
  • The OP wanted 25 different data frames. I think hadley's point is not that "`assign` is always a bad practice" but that "there are better data structures than 25 different data frames" and this is probably true – C8H10N4O2 Jul 02 '15 at 02:43
  • Thanks for the answers, very helpful. Why is it bad practice to have that many data frames? – chris dorn Jul 04 '15 at 12:46
3

You can do this by using split and then assigning this based on levels of the splitting factor. Below I illustrate with mtcars, but you can substitute your dataset in the first line (myDf) and your year variable in the second line (splitVar).

myDf <- mtcars
splitVar <- factor(myDf$gear)
levelsVar <- levels(splitVar)
splitDataFrame <- split(myDf, splitVar)
for (i in 1:length(levelsVar)) {
    assign(paste0("newDataFrameGear", levelsVar[i]), data.frame(splitDataFrame[i]))
}
ls(pattern = "^newData")
## [1] "newDataFrameGear3" "newDataFrameGear4" "newDataFrameGear5"
Ken Benoit
  • 14,454
  • 27
  • 50
  • 2
    why the downvotes ... may not be best practice (whatever that is) but answers the quesion as stated – user20650 Jul 02 '15 at 02:24
  • 1
    Down votes are puzzling to me too - the question was not reproducible but my answer was (using `mtcars`) and poster only needs to substitute his/her data in the first two lines. – Ken Benoit Jul 02 '15 at 02:27
  • 2
    @KenBenoit - someone is getting carried away with themselves I think. The question isn't asking about an ideal process, but no need to take it out on the answers! – thelatemail Jul 02 '15 at 02:28
  • Weird. In other news, @user20650 I'm interested in seeing how to use `list2env` to store in global environment. I looked at it but it just returns `` – C8H10N4O2 Jul 02 '15 at 02:39
  • @C8H10N4O2; if you do `ls()` after `list2env` you will see the datasets. For Kens example you can access one of them with `\`3\`` . Hence why i suggested renaming the list elements before moving to the global environment – user20650 Jul 02 '15 at 02:52
  • 1
    @user20650 neat -- if you don't mind, I'm going to update my answer using this. – C8H10N4O2 Jul 02 '15 at 03:00