0

There is a data table shown as follows:

                               timestamp       id
             785340 2016-04-01 00:01:19.000    4624
             785341 2016-04-01 00:01:19.000    4624
             785342 2016-04-01 00:02:20.000    4624
             794012 2016-04-01 00:02:21.000    4624
             18866  2016-04-01 00:02:28.142    4769
             794198 2016-04-01 00:03:31.000    4624
             18906  2016-04-01 00:03:40.130    4648
             18907  2016-04-01 00:04:40.155    4648
             18908  2016-04-01 00:05:40.157    4648
             797062 2016-04-01 00:05:48.000    4624

The table has two columns: timestamp and id. The timestamp column is generated using df.table$timestamp<-as.POSIXct(df.table$timestamp) The first column just represents the row number in the original table since I re-ordered the table based on timestamp.

Right now, I want to count the number of ids for a given minute, for instance the 1-th minute has 2 ids; the 2-th minute has 3 ids, etc. In other words, I want to extract the minute information from timestamp. The goal is to construct a time series, with one minute as an interval, the number of ids located within that 1 minute interval is the value. Are there any efficient way to do that? Thanks.

user288609
  • 12,465
  • 26
  • 85
  • 127
  • It is the seconds and not the minutes. – akrun Jun 18 '16 at 03:47
  • Not sure if the minute interval is based on all the dates or specific to a date. If date is specific, then `aggregate(id~timestamp, data = transform(df.table, format(timestamp, "%Y-%m-%d %H:%M")), FUN = length)` – akrun Jun 18 '16 at 04:22
  • I just show a subsection of data. It can include two months. Therefore, there will have 60*24*30*2 minutes. Thanks. – user288609 Jun 18 '16 at 04:24
  • If it is for the full dates and not specific to a data, change the format to `format(timestamp, "%M")` – akrun Jun 18 '16 at 04:29

2 Answers2

1

We can do this without any packages

aggregate(cbind(count=id)~ts_min, data = 
        transform(df.table, ts_min=format(timestamp, "%M")), FUN = length)
#   ts_min count
#1     01     2
#2     02     3
#3     03     2
#4     04     1
#5     05     2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thanks for the answer. It works fine. If I want to get the count for each specific ID. For instance, the 1-st minute has 2 "4624"; the 2-end minute has 2 "4624" and 1 "4769". I tried something like aggregate(cbind(count=(id=="4624"))~.. but it did not work. – user288609 Jun 19 '16 at 22:15
  • @user288609 Try with `aggregate(cbind(count=id)~ts_min + id, data = ...` – akrun Jun 20 '16 at 01:30
  • thanks for your answer. I tried, but it is not what I want. Maybe I should have explained my questions more clear. I posted this problem in the following thread, https://stackoverflow.com/questions/37914133/count-the-frequences-of-several-variables-for-a-given-category – user288609 Jun 20 '16 at 03:15
0

You could use lubridate and dplyr.

df.table %>%
    group_by(ts_min = minute(timestamp)) %>%
    summarise(count = n())

## Source: local data frame [5 x 2]
## 
##   ts_min count
##    (int) (int)
## 1      1     2
## 2      2     3
## 3      3     2
## 4      4     1
## 5      5     2
steveb
  • 5,382
  • 2
  • 27
  • 36