0

My question is about time series data. Suppose I have one file, named as P1 with column Time.Stamp and Value. Data table is given below:

Time.Stamp
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:08
01/01/2017 19:09
01/01/2017 19:09

Value      
12
24
45
56
78
76
34
65
87

I have another separated file, Named as P2 which has two columns , “Transaction from” and “transaction to” . This has the following columns:

Transaction from

01/01/2017 19:00
01/01/2017 19:15
02/01/2017 08:45
02/01/2017 09:00
02/01/2017 09:15
02/01/2017 09:30
03/01/2017 18:00
03/01/2017 18:15
03/01/2017 23:45
04/01/2017 00:15
04/01/2017 01:45

transaction to

01/01/2017 19:15
01/01/2017 19:30
02/01/2017 09:00
02/01/2017 09:15
02/01/2017 09:30
02/01/2017 09:45
03/01/2017 18:15
03/01/2017 18:30
04/01/2017 00:00
04/01/2017 00:30
04/01/2017 02:00

Now I want to search in R, which “Time.Stamp” from file P1 are belongs to the duration of “Transaction from” to “transaction to” of file P2. If any “Time.Stamp” is in the range of mentioned two columns of P2 then the associated value with Time.stamp will be aggregated. The length of columns of file P1 and file P2 is not equal. Length of P1 is much more long than length of P2.

It will be very helpful, if any one can find a solution in R.

andrew_reece
  • 20,390
  • 3
  • 33
  • 58

3 Answers3

1

This is a possible duplication of How to perform join over date ranges using data.table? Assuming that P1 & P2 are data frames and dates are POSIXct at the beginning, here is the livesaver join provided by data.table:

library(data.table)

setDT(P1)
setDT(P2)

P1[ , dummy := Time.Stamp]

setkey(P2, Transaction.from, transaction.to)

dt <- foverlaps(
  P1,
  P2,
  by.x = c("Time.Stamp", "dummy"),
  # mult = "first"/mult = "first" will only choose first/last match
  nomatch = 0L
)[ , dummy := NULL]

# you can run ?data.table::foverlaps for the documentation 

Please refer to this great blog post for a step-by-step explanation and other possible answers.

After this point you can simply:

library(dplyr)

dt %>%
  group_by(Transaction.from) %>%
  mutate(total = sum(value))

Please note that this solution may seem long for the simple aggregation you asked. However, it will come very handy if you need to merge the data frames and conduct more complex analysis.

OzanStats
  • 2,756
  • 1
  • 13
  • 26
0

First, convert all date to as.POSIXct(x,format = "%d/%m/%Y %H:%M"). Then look if each elements of p1$Time.Stamp is in any period of p2[,1] to p2[,2] by following function , then aggregate:

isitthere<- function(x,from=p2$`Transaction from`,to=p2$`transaction to`){
  any(x >=from & x<= to)
}

Apply the function to all p1$Time.Stamp:

index<-sapply(p1$Time.Stamp, isitthere,from=p2$`Transaction from`,to=p2$`transaction to`)
index
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Now aggregate:

sum(p1$Value[index])
[1] 477
Iman
  • 2,224
  • 15
  • 35
  • This solution is working for the relatively small amount of data but for large amount, it takes some time... but the solution was helpful to understand. Thanks a lot. – Rezwan Munar Aug 06 '18 at 13:19
  • @Md.RezwanIslamMunar So use `C` language for loops. – Iman Aug 06 '18 at 16:34
0

I am not clear about what is to be aggregated by what but assuming that DF1 and DF2 are as defined in the Note at the end then this will, for each row in DF2, look up zero or more rows in DF1 and then sum all Value for those rows having the same Transaction.from and Transaction.to.

library(sqldf)

sqldf("select [Transaction.from], [Transaction.to], sum(Value) as Value
       from DF2
       left join DF1 on [Time.Stamp] between [Transaction.from] and [Transaction.to]
       group by [Transaction.from], [Transaction.to]")

giving:

      Transaction.from      Transaction.to Value
1  2017-01-01 19:00:00 2017-01-01 19:15:00   477
2  2017-01-01 19:15:00 2017-01-01 19:30:00    NA
3  2017-02-01 08:45:00 2017-02-01 09:00:00    NA
4  2017-02-01 09:00:00 2017-02-01 09:15:00    NA
5  2017-02-01 09:15:00 2017-02-01 09:30:00    NA
6  2017-02-01 09:30:00 2017-02-01 09:45:00    NA
7  2017-03-01 18:00:00 2017-03-01 18:15:00    NA
8  2017-03-01 18:15:00 2017-03-01 18:30:00    NA
9  2017-03-01 23:45:00 2017-04-01 00:00:00    NA
10 2017-04-01 00:15:00 2017-04-01 00:30:00    NA
11 2017-04-01 01:45:00 2017-04-01 02:00:00    NA

Note

Lines1 <- "
Time.Stamp,Value
01/01/2017 19:08,12
01/01/2017 19:08,24
01/01/2017 19:08,45
01/01/2017 19:08,56
01/01/2017 19:08,78
01/01/2017 19:08,76
01/01/2017 19:08,34
01/01/2017 19:09,65
01/01/2017 19:09,87
"
DF1 <- read.csv(text = Lines1)
fmt <- "%m/%d/%Y %H:%M"
DF1 <- transform(DF1, Time.Stamp = as.POSIXct(Time.Stamp, format = fmt))

Lines2 <- "
Transaction.from,Transaction.to
01/01/2017 19:00,01/01/2017 19:15
01/01/2017 19:15,01/01/2017 19:30
02/01/2017 08:45,02/01/2017 09:00
02/01/2017 09:00,02/01/2017 09:15
02/01/2017 09:15,02/01/2017 09:30
02/01/2017 09:30,02/01/2017 09:45
03/01/2017 18:00,03/01/2017 18:15
03/01/2017 18:15,03/01/2017 18:30
03/01/2017 23:45,04/01/2017 00:00
04/01/2017 00:15,04/01/2017 00:30
04/01/2017 01:45,04/01/2017 02:00
"
DF2 <- read.csv(text = Lines2)
DF2 <- transform(DF2, Transaction.from = as.POSIXct(Transaction.from, format = fmt),
                      Transaction.to = as.POSIXct(Transaction.to, format = fmt))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341