My second question of the day and my last attempt to use R to clean up this data. Here's the sitrep:
I have a data frame that has a column which is a POSIXlt date type. I want to extract the day, month and year from that column and create 3 new columns called (cleverly) day, month and year.
The data frame looks like this:
order_id dd_mmm_yy
1 2005-07-28
2 2007-03-04
I want to end up with this:
order_id dd_mmm_yy day month year
1 2005-07-28 28 7 2005
2 2007-03-04 4 3 2007
I've created a function to extract the day, month and year and return them in a list (or data frame, I've tried both).
extractdate = function (date) {
day = format(date, format="%d")
month = format(date, format="%m")
year = format(date, format="%Y")
list(day=day, month=month, year=year)
}
Here's what I've tried based on an earlier problem and question:
cbind(orders, t(sapply(orders$dd_mmm_yy, extractdate)))
which gives me this:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 5, 9
The t(sapply... by itself gives me this for some crazy reason:
day month year
sec Character,5 Character,5 Character,5
min Character,5 Character,5 Character,5
hour Character,5 Character,5 Character,5
mday Character,5 Character,5 Character,5
mon Character,5 Character,5 Character,5
year Character,5 Character,5 Character,5
wday Character,5 Character,5 Character,5
yday Character,5 Character,5 Character,5
isdst Character,5 Character,5 Character,5
What on earth is going on? Am I better off using something like Python or Java to do all the data manipulation I need to do on this data before bringing it into R for analysis?