4

I have a dataset that contains 4 different events-types (A, B, C, D) that happens a lot of times daily. I have such a log for over a year. The "EventType" attribute is a 'factor'.

For eg, my dataset looks like this:

DateTime,EventType
6/5/2013 9:35,B
6/5/2013 9:35,A
6/5/2013 9:35,B
6/5/2013 9:36,D
6/5/2013 9:39,A
6/5/2013 9:40,B
7/5/2013 9:35,B
7/5/2013 9:35,A
7/5/2013 9:35,B
7/5/2013 9:36,D
7/5/2013 9:39,A
7/5/2013 9:40,B
8/5/2013 9:35,A
8/5/2013 9:35,A
8/5/2013 9:35,B
8/5/2013 9:36,B
8/5/2013 9:39,A
8/5/2013 9:40,B
9/5/2013 9:35,B
9/5/2013 9:35,B
9/5/2013 9:35,B
9/5/2013 9:36,D
9/5/2013 9:39,A
9/5/2013 9:40,A

I want to plot the total-count of all the event-types on a daily basis. x-axis: date-time, Y-axis: count.

I like to try ddply to accomplish this, but, I am not very sure how to go about it. This is what I have done:

data <- read.csv("C:/analytics/mydata.csv", sep=",", header=TRUE)
k <- ddply(data, "data$DateTime", function(x) count = nrow(x))

The above gives the following output:

       data$DateTime V1
1  6/5/2013 9:35,A  1
2  6/5/2013 9:35,B  2
3  6/5/2013 9:36,D  1
4  6/5/2013 9:39,A  1
5  6/5/2013 9:40,B  1
6  7/5/2013 9:35,A  1
7  7/5/2013 9:35,B  2
8  7/5/2013 9:36,D  1
9  7/5/2013 9:39,A  1
10 7/5/2013 9:40,B  1
11 8/5/2013 9:35,A  2
12 8/5/2013 9:35,B  1
13 8/5/2013 9:36,B  1
14 8/5/2013 9:39,A  1
15 8/5/2013 9:40,B  1
16 9/5/2013 9:35,B  3
17 9/5/2013 9:36,D  1
18 9/5/2013 9:39,A  1
19 9/5/2013 9:40,A  1

My Question: How do I achieve the same behavior if I want to get the counts by day or month? I want to use lubridate to get day or month, but, after that, I do not know how to use that to group and subsequently to get the counts.

Something like k <- ddply(data, "day(data$EventType)", function(x) count = nrow(x))

Once I have it, I can believe I can plot them nicely. Your inputs are very much appreciated.

Thanks.

user1717931
  • 2,419
  • 5
  • 29
  • 40

1 Answers1

5

There's a few ways to do this. The main thing is to make sure you are working with Date/Time classes. There is a method to round POSIX times to the day, and then you can use a number of aggregation functions to count the events per day:

#  Make sure your character strings represent date and times and then round to days
df[,1]<- as.POSIXct(df[,1],format="%d/%m/%Y %H:%M")
df$Day <- as.character( round(df[,1] , "day" ) )

Using ddply as you originally intended...

require(plyr)
ddply( df , .(Day) , summarise , Count = length(EventType) )
Day Count
1 2013-05-06     6
2 2013-05-07     6
3 2013-05-08     6
4 2013-05-09     6

A base solution....

aggregate( df , by = list(df$Day) , length )
     Group.1 DateTime EventType Day
1 2013-05-06        6         6   6
2 2013-05-07        6         6   6
3 2013-05-08        6         6   6
4 2013-05-09        6         6   6
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • @user1717931 paste that into your question as a code block. Also paste in the output from `str(ai)` - it will help to tell what is going on. – Simon O'Hanlon Jun 19 '13 at 15:58
  • thanks for your comments. I shall try that. I noticed in my ddply output, the datetime and eventtype are kind of concatenated together...and not in separate columns. ddply does this. wonder how to separate them. – user1717931 Jun 19 '13 at 15:59