1

I have a data frame that looks like this:

species<-"ABC"
ind<-rep(1:4,each=24)
hour<-rep(seq(0,23,by=1),4)
month<-rep(seq(1,12),8)
depth<-runif(length(ind),1,50)
df<-data.frame(species,ind,month,hour,depth)

What I would like is to use the column month to specify intervals for each season and return those values in a new column from the same data frame. I was using this code for the seasons, which seems to work fine,

# Classify months into seasons

summer<-c(1,2,12)
fall<-c(3,4,5)
winter<-c(6,7,8)
spring<-c(9,10,11)

# Create a new column with seasons

df$season<-NA
for(i in 1:nrow(df)){
  if(df$month[i]%in%summer){df$season[i]<-"1-summer"} else
    if(df$month[i]%in%fall){df$season[i]<-"2-fall"} else
      if(df$month[i]%in%winter){df$season[i]<-"3-winter"} else
        if(df$month[i]%in%spring){df$season[i]<-"spring"} 

}

However, this loop in already inside of a bigger loop with more complex and bigger data bases. So I was looking for a faster, more efficient approach. The reason that I am using a loop rather than cutting or subsetting my original data frame is because the first loop that I am using is separating and performing analyses on individual animals. The length of resulting data frame varies between animals and one of the problems that I was having is that not all animals were present in all months, so when I was trying to assign seasons inside the loop for animals that were not present on a particular season, then R gave me an error message...

Community
  • 1
  • 1
user1626688
  • 1,583
  • 4
  • 18
  • 27

2 Answers2

2
seasons <- c("1-summer", "2-fall", "3-winter", "spring")
df$season2 <- factor(trunc(df$month %% 12 / 3) + 1, labels = seasons)
table(df$season, df$season2)

You can convert df$season2 to character if you wish.

djhurio
  • 5,437
  • 4
  • 27
  • 48
  • It does not work. It gives me this error message "Error in factor(trunc(df$month%%12/3) + 1, labels = seasons) : invalid labels; length 4 should be 1 or 0" – user1626688 Dec 19 '12 at 12:32
  • @user1626688 You receive this error because you did not include `month` in your data frame `df`. Both djhurio's solution and your loop will work if you add the column `month` to your data frame. – Sven Hohenstein Dec 19 '12 at 12:34
  • I am trying to run this code in my real script but it is giving me a similar error message "invalid labels; length 4 should be 1 or 1". In some of the individuals there is only data for October and November. Is this going to be an issue here with this code? – user1626688 Dec 19 '12 at 12:44
  • Strange, I don't receive the error on example data. Try to simplify if something doesn't work. Drop `factor`. Does this work? `df$season2 <- trunc(df$month %% 12 / 3) + 1` – djhurio Dec 19 '12 at 14:11
  • The example is fine. This code seems to work fine, but when I try to run the code on real data (which not always have all months because some of the animals in my study were only present for 3 or 4 months) then I get an error message. I am not sure if I need to have all months so that the code can specify all seasons. If for example the months for summer are missing, then I don't know if it will still work? – user1626688 Dec 19 '12 at 22:11
  • You don't need all month represented in data for code to work. Did you tried the version without factor `trunc(df$month %% 12 / 3) + 1`? Does it produce another error? What is the output of `table(df$month)` from your data? – djhurio Dec 20 '12 at 04:37
1

I'd just generate the lookup table for season names and apply that:

> season.names <- rep("",12)
> season.names[summer] <- "1-summer"
> season.names[fall] <- "2-fall"
> season.names[winter] <- "3-winter"
> season.names[spring] <- "4-spring"
> season.names
 [1] "1-summer" "1-summer" "2-fall"   "2-fall"   "2-fall"   "3-winter" "3-winter"
 [8] "3-winter" "4-spring" "4-spring" "4-spring" "1-summer"
> df$season <- season.names[df$month]
> head(df)
  species ind month hour     depth   season
1     ABC   1     1    0 41.643471 1-summer
2     ABC   1     2    1 36.055533 1-summer
3     ABC   1     3    2  1.901639   2-fall
4     ABC   1     4    3  7.737539   2-fall
5     ABC   1     5    4 35.327364   2-fall
6     ABC   1     6    5  9.156978 3-winter
Jonathan Dursi
  • 50,107
  • 9
  • 127
  • 158
  • I am trying to use the same approach for "day"/"night" but I noticed that the last "night" for hour 23 is not showing so that code does not run properly. Any suggestions? day<-seq(6,17,1) alltimes<-seq(0,23,1) night<-alltimes[!alltimes%in%day] diel.names<-rep("",24) diel.names[day]<-"day" diel.names[night]<-"night" diel.names – user1626688 Dec 20 '12 at 00:04
  • The problem is that the hours start at 0 and go to 23, but the vector diel.names has to be indexed from 1 to 24. One way to deal with that is just to have index `i` refer to hour `i-1`: `diel.names[day+1]<-"day" ; diel.names[night+1]<-"night"; df$diel <- diel.names[df$hour+1]`. – Jonathan Dursi Dec 20 '12 at 03:29