4

I have referred:

All the examples are are based on testing for either numeric vectors or NA in other columns and adding a new variable. Here's a short reproducible example:

x <- c("dec 12", "jan 13", "feb 13", "march 13", "apr 13", "may 13",
       "june 13", "july 13", "aug 13", "sep 13", "oct 13", "nov 13")
y <- c(234, 678, 534, 122, 179, 987, 872, 730, 295, 450, 590, 312)
df<-data.frame(x,y)

I want to add, "winter" for df$x = dec | jan | feb, "spring" for march|apr|may, "summer" and "autumn".

I tried

df$season <- ifelse(df[1:3, ], "winter", ifelse(df[4:6, ], "spring", 
                    ifelse(df[7:9, ], "summer", "autumn")))

which I know is a very inefficient way of doing things but I'm a newbie and a kludger. It returned the error:

Error in ifelse(df[1:3, ], "winter", ifelse(df[4:6, ], "spring",
ifelse(df[7:9,  : (list) object cannot be coerced to type 'logical'

If the same data frame had thousands of rows and I wanted to loop through it and create a new variable for season based on month of the year, how could I do this? I referred:" Looping through a data frame to add a column depending variables in other columns" but this is looping and setting a mathematical operator for creating the new variable. I tried external resources: a thread on the R mailing list and a thread on the TalkStats forum. However, again both are based on numeric variables and conditions.

Community
  • 1
  • 1
vagabond
  • 3,526
  • 5
  • 43
  • 76
  • 1
    this answer should help you: http://stackoverflow.com/a/22124477/3283824 – erc Mar 02 '14 at 08:01
  • @beetroot, I was just thinking the same thing ;-) – A5C1D2H2I1M1N2O1R2T1 Mar 02 '14 at 08:03
  • 1
    I would start with `Months <- substr(df$x, 1, 3)` and convert the constant `month.abb()` to lower case (use `tolower`) and proceed from there. – A5C1D2H2I1M1N2O1R2T1 Mar 02 '14 at 08:05
  • @beetroot yes, that will work when r recognizes the month name as `month.abb` but the databases we get out reports from, the month or date column is usually pretty scrambled. I can learn a regex to add a variable with date format native to R? Or am I over-complicating it? – vagabond Mar 02 '14 at 08:07
  • @vagabond, what do you mean "pretty scrambled"? Does it look like the `substr` approach I suggested would work (which would truncate the "x" column to the first three characters)? – A5C1D2H2I1M1N2O1R2T1 Mar 02 '14 at 08:10
  • @AnandaMahto just working on the `substr` approach. I have to sometimes join data from two campaigns. The media owners supply the data of each impression with date and time. The date and time format from one media owner to another differs considerably based on their ERP system. Some have "Jan 2014" or 01/28/14. Additionally, if I have to perform time-band analysis to check which time band is performing, I run into fields like: 20:08:48, 21:39:45 (HH:MM:SS) – vagabond Mar 02 '14 at 08:18
  • 1
    @vagabond, that's a bit different of a question. I would say that the first step would be getting all of your dates into a common format. – A5C1D2H2I1M1N2O1R2T1 Mar 02 '14 at 08:19
  • @AnandaMahto I agree that is another question. But I reckon if I learnt to categorize the record by adding a variable based on another categorical variable (add season based on date variable which is character), I'll have a more universally applicable way of overcoming this kind of challenge. – vagabond Mar 02 '14 at 08:24
  • What about this? http://stackoverflow.com/questions/9500114/find-which-season-a-particular-date-belongs-to – Roman Luštrik Mar 02 '14 at 08:52

1 Answers1

7

If you have a really large data frame, then data.table will be very helpful for you. The following works:

library(data.table)
x <- c("dec 12", "jan 13", "feb 13", "march 13", "apr 13", "may 13",
   "june 13", "july 13", "aug 13", "sep 13", "oct 13", "nov 13")
y <- c(234, 678, 534, 122, 179, 987, 872, 730, 295, 450, 590, 312)
df <-data.frame(x,y)
DT <- data.table(df)
DT[, month := substr(tolower(x), 1, 3)]
DT[, season := ifelse(month %in% c("dec", "jan", "feb"), "winter",
               ifelse(month %in% c("mar", "apr", "may"), "spring",
               ifelse(month %in% c("jun", "jul", "aug"), "summer", 
               ifelse(month %in% c("sep", "oct", "nov"), "autumn", NA))))]
DT
          x   y month season
1:   dec 12 234   dec winter
2:   jan 13 678   jan winter
3:   feb 13 534   feb winter
4: march 13 122   mar spring
5:   apr 13 179   apr spring
6:   may 13 987   may spring
7:  june 13 872   jun summer
8:  july 13 730   jul summer
9:   aug 13 295   aug summer
0:   sep 13 450   sep autumn
1:   oct 13 590   oct autumn
12:  nov 13 312   nov autumn
Konstantinos
  • 4,096
  • 3
  • 19
  • 28
  • that works! thanks. Where'd you discover the data.table package? Should definitely do some exercises from the source. – vagabond Mar 08 '14 at 17:49
  • 1
    `data.table` recently discovered by me, some weeks ago, because of the experts here. It's magnificent. :D http://datatable.r-forge.r-project.org/ – Konstantinos Mar 09 '14 at 11:01