I'll start by giving a simple answer then I'll delve into the details.
I quick way to do this would be to check the values of MON and DAY and output the correct season. This is trivial :
f=function(m,d){
if(m==12 && d>=21) i=3
else if(m>9 || (m==9 && d>=21)) i=2
else if(m>6 || (m==6 && d>=21)) i=1
else if(m>3 || (m==3 && d>=21)) i=0
else i=3
}
This f
function, given a day and a month, will return an integer corresponding to the season (it doesn't matter much if it's an integer or a string ; integer only allows to save a bit of memory but it's a technicality).
Now you want to apply it to your data.frame. No need to use a loop for this ; we'll use mapply
. d
will be our simulated data.frame. We'll factor the output to have nice season names.
d=data.frame(MON=rep(1:12,each=30),DAY=rep(1:30,12),YEAR=2012))
d$SEA=factor(
mapply(f,d$MON,d$DAY),
levels=0:3,
labels=c("Spring","Summer","Autumn","Winter")
)
There you have it !
I realize seasons don't always change a 21st. If you need fine tuning, you should define a 3-dimension array as a global variable to store the accurate days. Given a season and a year, you could access the corresponding day and replace the "21"s in the f
function with the right calls (you would obviously add a third argument for the year).
About the things you mentionned in your question :
ifelse
is the "functionnal" way to make a conditionnal test. On atomic variables it's only slightly better than the conditionnal statements but it is vectorized, meaning that if the argument is a vector, it will loop itself on its elements. I'm not familiar with it but it's the way to got for an optimized solution
mapply
is derived from sapply
of the "apply
family" and allows to call a function with several arguments on vector (see ?mapply
)
- I don't think
:=
is a standard operator in R, which brings me to my next point :
data.table
! It's a package that provides a new structure that extends data.frame
for fast computing and typing (among other things). :=
is an operator in that package and allows to define new columns. In our case you could write d[,SEA:=mapply(f,MON,DAY)]
if d
is a data.table.
If you really care about performance, I can't insist enough on using data.table
as it is a major improvement if you have a lot of data. I don't know if it would really impact time computing with the solution I proposed though.