-1

I want to have the sum() of counts (rows) between a specific day. I found some solutions on stack, but the point is that my second data frame is much bigger then data frame one.

Data set one

dim(foo1) # 600 / 2

Start                      End
2017-10-24 22:33:59   2017-10-24 22:43:59
2017-11-13 06:34:59   2017-11-13 06:44:59
2017-11-13 06:52:00   2017-11-13 07:02:00
2017-11-13 07:16:59   2017-11-13 07:26:59
2017-11-13 07:35:59   2017-11-13 07:45:59

Data set two

dim(foo2) # 60.000 / 2

Count           Time
1              2017-10-01 13:45:02
1              2017-10-01 12:53:23
1              2017-10-01 12:20:56
1              2017-10-01 12:31:12

I want the sum, of all the rows (Count) from foo2, appearing between the Start and End date in foo1). Result should be Foo1 + new_column (containing counts)

This is my beginning 'solution' that doesn't work:

for(i in 1:nrow(foo1)){
  foo1$new_column[i] <-sum(foo2$Count[which( 
    foo2$Time >= foo2$Start[i] &
      foo2$Time <= foo2$End[i])]) 
}
Loesje
  • 3
  • 3
  • Are you asking exaclty the same question as this [one here](https://stackoverflow.com/questions/51301948/r-combine-two-dataframes-with-calculation-range/51305055#51305055)? You even have the same time frames and non-overlap. – phiver Jul 13 '18 at 12:45
  • Sample data seems incorrect. No timestamp from `foo2$Time` falls within a range of `foo1$Start` - `foo1$End` – Wimpel Jul 13 '18 at 12:45
  • I guess we are in the same class phiver. Sorry for not finding it. – Loesje Jul 13 '18 at 13:11

2 Answers2

1

There seems to be a problem in your sample-data, since no Time from foo2 (all on 2017-10-01) falls within a time-interval from foo1 (ranges start at 2017-10-24) .

For this answer I created my own sample data.

library(data.table)

foo1 <- data.table( Start = c("2017-10-24 22:33:59", "2017-11-13 06:34:59", "2017-11-13 06:52:00", "2017-11-13 07:16:59", "2017-11-13 07:35:59"),
                    End = c("2017-10-24 22:43:59", "2017-11-13 06:44:59", "2017-11-13 07:02:00", "2017-11-13 07:26:59", "2017-11-13 07:45:59"),
                    stringsAsFactors = FALSE)

#                  Start                 End
# 1: 2017-10-24 22:33:59 2017-10-24 22:43:59
# 2: 2017-11-13 06:34:59 2017-11-13 06:44:59
# 3: 2017-11-13 06:52:00 2017-11-13 07:02:00
# 4: 2017-11-13 07:16:59 2017-11-13 07:26:59
# 5: 2017-11-13 07:35:59 2017-11-13 07:45:59

foo2 <- data.table( Count = c(1,1,1,1),
                    Time = c("2017-10-24 22:37:02", "2017-10-24 22:38:23", "2017-11-13 07:20:56", "2017-10-01 12:31:12"),
                    stringsAsFactors = FALSE)

#    Count                Time
# 1:     1 2017-10-24 22:37:02
# 2:     1 2017-10-24 22:38:23
# 3:     1 2017-11-13 07:20:56
# 4:     1 2017-10-01 12:31:12

#set times as POSIXct
foo1[, Start := as.POSIXct(Start, format = "%Y-%m-%d %H:%M:%S")]
foo1[, End := as.POSIXct(End, format = "%Y-%m-%d %H:%M:%S")]
foo2[, Time :=  as.POSIXct(Time, format = "%Y-%m-%d %H:%M:%S")]

#add a dummy-column to create a time-range (of 1 second)
foo2[, dummy := Time]

#set data.table keys
setkey(foo1, Start, End)
setkey(foo2, Time, dummy)

#overlap-join, lose the dummy-column
foo3 <- foverlaps(foo2, foo1, type = "within", mult = "first", nomatch = 0L)[, dummy := NULL]

#                  Start                 End Count                Time
# 1: 2017-10-24 22:33:59 2017-10-24 22:43:59     1 2017-10-24 22:37:02
# 2: 2017-10-24 22:33:59 2017-10-24 22:43:59     1 2017-10-24 22:38:23
# 3: 2017-11-13 07:16:59 2017-11-13 07:26:59     1 2017-11-13 07:20:56

foo3[, sum(Count), by = "Start"]
#                  Start V1
# 1: 2017-10-24 22:33:59  2
# 2: 2017-11-13 07:16:59  1
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • Thanks, but this solutions doesn't add a column to my first data frame with there the columns, right? Result is a data frame (containing the columns: Start, End, Count (which is everywhere 1), time and dummy)... – Loesje Jul 13 '18 at 14:50
  • @Loesje count still exists, you you can sum op anything you like...Just use `summarise` or data.tables `by=` to sum counts... – Wimpel Jul 13 '18 at 15:19
  • Even after a group_by(Start) and then summarise(Count), the counts are not matching unfortunately... Some of them do, but the others are stuck at a count for example 1... – Loesje Jul 13 '18 at 15:35
  • updates answer with some data that actually makes sense – Wimpel Jul 13 '18 at 16:17
0

Since your original data set doesn't appear to have any overlap I added an additional row in my example. I used dplyr mutate to add a column that contained a rowwise between comparison of each Start and End to the entire list of foo2$Time then I just summed the foo2$Count for the resulting set.

library(dplyr)
foo2 <- foo2 %>% add_row(Count = 3, Time = as.Date("2017-10-24 22:35:00", tz = "UTC"))
foo1 %>% rowwise() %>%  mutate(Count = sum(foo2$Count[between(as.Date(foo2$Time), as.Date(Start), as.Date(End))]))

#     Source: local data frame [500 x 3]
# Groups: <by row>
# 
# A tibble: 500 x 3
#    Start               End                 Count
#    <dttm>              <dttm>              <dbl>
#  1 2017-10-24 22:33:59 2017-10-24 22:43:59  3.00
#  2 2017-11-13 06:34:59 2017-11-13 06:44:59  0   
#  3 2017-11-13 06:52:00 2017-11-13 07:02:00  0   
#  4 2017-11-13 07:16:59 2017-11-13 07:26:59  0   
#  5 2017-11-13 07:35:59 2017-11-13 07:45:59  0   
#  6 2017-11-13 09:46:00 2017-11-13 09:56:00  0   
#  7 2017-11-13 10:46:00 2017-11-13 10:56:00  0   
#  8 2017-11-13 11:11:00 2017-11-13 11:21:00  0   
#  9 2017-11-13 13:33:00 2017-11-13 13:43:00  0   
# 10 2017-11-13 13:50:59 2017-11-13 14:00:59  0   
# # ... with 490 more rows
jasbner
  • 2,253
  • 12
  • 24