Analyse data based on date

Question

I have a data frame spanning a couple of months, split into several values per day. I have assigned one column as a date (using as.Date() ), but is it possible to run operations over data for a specific day?

For example, I want to run summary() on 29-04-14 and return only a summary for this particular date.

         Date     Time   col3   col4
6084 28-04-14 23:42:30   0.48 164.00
6085 28-04-14 23:47:00   0.39 150.00
6086 28-04-14 23:51:30   0.41 164.00
6087 28-04-14 23:56:00   0.33 160.00
6088 29-04-14 00:00:00   0.32 138.00
6089 29-04-14 00:04:30   0.15 141.00
6090 29-04-14 00:09:00   0.12 133.00

I have been trying for hours and installed xts, but have had no success :(

Yes. It's possible. Provide a reproducible example showing where you're starting and where you want to end up, and I'm sure there will be several suggestions on how to get there. — A5C1D2H2I1M1N2O1R2T1, May 08 '14 at 16:48
It's impossible to tell by looking at this data what the data types of your columns are. Are they `Date`object? characters? factors? Do you want to run summary across all the columns? Only for one date or would you like to run `summary` for all dates? — MrFlick, May 08 '14 at 18:44
@MrFlick - I'm assuming the dates are a `"Date"` object as I've assigned them using `as.Date()`. The other values are a simple import via `read.csv()`. I'm only using summary() as an example, ideally I want to run `median()`, `mean()`, `hist()` + others basic functions on both col3 and col4. I just wanted to know if, considering I've defined Date as a date, if there was some nice and easy function that would spit out stats based on a specific date or date range. — ijf, May 08 '14 at 18:47
When posting data in the future, please follow the recommendations of [this question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — MrFlick, May 08 '14 at 18:53

score 2 · Accepted Answer · answered May 08 '14 at 19:12

Here's a possibility to get you started with the "xts" package.

Start with some sample data, as a data.frame:

mydf <- structure(list(Date = c("28-04-14", "28-04-14", "28-04-14", "28-04-14",
                                "29-04-14", "29-04-14", "29-04-14"), 
                       Time = c("23:42:30", "23:47:00", "23:51:30", "23:56:00", 
                                "00:00:00", "00:04:30", "00:09:00"), 
                       col3 = c(0.48, 0.39, 0.41, 0.33, 0.32, 0.15, 0.12), 
                       col4 = c(164, 150, 164, 160, 138, 141, 133)), 
                  .Names = c("Date", "Time", "col3", "col4"), 
                  class = "data.frame", row.names = c(NA, -7L))

Convert it to an "xts" object:

library(xts)
myxts <- xts(mydf[-c(1, 2)], 
             as.POSIXct(paste(as.Date(mydf$Date, format="%d-%m-%y"), 
                              mydf$Time), tz=""))

It now looks like this:

myxts
#                     col3 col4
# 2014-04-28 23:42:30 0.48  164
# 2014-04-28 23:47:00 0.39  150
# 2014-04-28 23:51:30 0.41  164
# 2014-04-28 23:56:00 0.33  160
# 2014-04-29 00:00:00 0.32  138
# 2014-04-29 00:04:30 0.15  141
# 2014-04-29 00:09:00 0.12  133

The "xts" package has a lot of convenient time-based functions, like apply.daily, apply.weekly, and so on.

apply.daily(myxts, mean)
#                          col3     col4
# 2014-04-28 23:56:00 0.4025000 159.5000
# 2014-04-29 00:09:00 0.1966667 137.3333

It also lets you conveniently extract a specific date or time period:

myxts["2014-04-29"]
#                     col3 col4
# 2014-04-29 00:00:00 0.32  138
# 2014-04-29 00:04:30 0.15  141
# 2014-04-29 00:09:00 0.12  133

Here's what summary looks like for that particular date:

summary(myxts["2014-04-29"])
#      Index                          col3             col4      
#  Min.   :2014-04-29 00:00:00   Min.   :0.1200   Min.   :133.0  
#  1st Qu.:2014-04-29 00:02:15   1st Qu.:0.1350   1st Qu.:135.5  
#  Median :2014-04-29 00:04:30   Median :0.1500   Median :138.0  
#  Mean   :2014-04-29 00:04:30   Mean   :0.1967   Mean   :137.3  
#  3rd Qu.:2014-04-29 00:06:45   3rd Qu.:0.2350   3rd Qu.:139.5  
#  Max.   :2014-04-29 00:09:00   Max.   :0.3200   Max.   :141.0

this looks excellent, thanks... but it hasn't worked. When I attempt: `myxts["29-04-14"]` only headers are returned. When I attempt `apply.daily(myxts$col3, median)`, for some dates there is more than one entry: e.g.`27-04-14 00:22:41 0.605 28-04-14 00:22:30 0.500 28-04-14 23:56:00 0.435 29-04-14 00:22:30 0.160 29-04-14 23:57:32 0.460` — ijf, May 08 '14 at 21:21
@ijf, an xts object is not a `data.frame`, so you con't access the columns the same way. Comments are not a good place to ask for clarification with code because the code is not reproduced properly. Either update your question or join the R Public chat room to try to get some clarification. Link back to this question and answer if you do so. — A5C1D2H2I1M1N2O1R2T1, May 09 '14 at 03:34

score 0 · Answer 2 · answered May 08 '14 at 18:52

You can treat your date like any other factor for the most part. If you want to run summary for each date, you can use the standard by function

by(data, data$Date, summary)

Date ranges are a bit more tricky. I might recommend using cut.Date or seq.Date to define data ranges and again using those as factors to collapse into groups you are interested in.

Analyse data based on date

2 Answers2