0

I have this dataset.

df <- data.frame(c("Attribute1", "Attribute1", "Attribute1", "Attribute2", "Attribute2"),
                 c("2018-11-01 00:00:19", "2018-11-01 00:00:54", "2018-11-01 00:01:17",
                   "2018-11-01 00:01:23", "2018-11-01 00:01:25"))
names(df) <- c("Signature", "date")
df$date <- as.POSIXct(df$date)

I would like to know how to count the same attribute within the past 1 hour in R programming, this would be the result I would like to have:

enter image description here

Then the Count_Signature will count the how much "Attribute 1" were in the past hour and so on.

Thank you

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • 5
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not helpful because then we have to retype everything just to test it. – MrFlick Jan 03 '19 at 16:38
  • Hello @MrFlick I did an example, please let me know if it works good – ScottUrbina Jan 03 '19 at 18:34

1 Answers1

0

Here's a solution for you. I leveraged data.table because it has some nice time functionality and is highly performant at group-by calculations. I created a time index, collapsing based on year, year day, and binning by hour. You would need to modify this if you intend to group by the "last hour" of R programming based off system time. In that case, Sys.time() can be your friend.

Anyhow, here's the solution:

df <- data.frame(c("Attribute1", "Attribute1", "Attribute1", "Attribute2", "Attribute2"),
                 c("2018-11-01 00:00:19", "2018-11-01 00:00:54", "2018-11-01 00:01:17",
                   "2018-11-01 00:01:23", "2018-11-01 00:01:25"))
names(df) <- c("Signature", "date")
df$date <- as.POSIXct(df$date)

library(data.table)
dt <- setDT(df)
dt[, time_idx := paste0(year(date), "-", yday(date), "-", hour(date))]
dt[, Count_Signature := (1L:.N) - 1L, keyby = .(Signature, time_idx)]
dt
#>     Signature                date   time_idx Count_Signature
#> 1: Attribute1 2018-11-01 00:00:19 2018-305-0               0
#> 2: Attribute1 2018-11-01 00:00:54 2018-305-0               1
#> 3: Attribute1 2018-11-01 00:01:17 2018-305-0               2
#> 4: Attribute2 2018-11-01 00:01:23 2018-305-0               0
#> 5: Attribute2 2018-11-01 00:01:25 2018-305-0               1

Created on 2019-01-03 by the reprex package (v0.2.1)

Chase
  • 67,710
  • 18
  • 144
  • 161
  • I think this is what I need, since I am looking the last hour of each attribute, a quick question , how can I add to this an unique identifier? – ScottUrbina Jan 04 '19 at 14:30
  • Not sure I follow - unique in that you have another column that already represents your unique identifier? Or you need to add one? If you need to add one, I would simply add something like a sequence from 1:N – Chase Jan 04 '19 at 14:46
  • I think I got it, thanks a lot – ScottUrbina Jan 04 '19 at 19:01
  • Hello, a quick question, how can I do the same but instead last hour, I need also last half hour. Thanks – ScottUrbina Jan 07 '19 at 12:49
  • https://stat.ethz.ch/pipermail/r-help/2012-June/315336.html – Chase Jan 07 '19 at 15:53