0

So currently I am able to calculate a daily max for one site using the following code:

library('dplyr')
library('data.table')
library('tidyverse')
library('tidyr')
library('lubridate')

funcVolume <- function(max_data$enter_yard, max_data$exit_yard)

{
vecOnes <- array(1,c(length(max_data$enter_yard),1))
vecTime <- c(max_data$enter_yard,max_data$exit_yard)
vecCount <- c(vecOnes,-vecOnes)
df_test <- data.frame(T = vecTime, Count = vecCount)
df_test <-  df_test %>%
arrange(T) %>%
mutate(Volume = cumsum(Count))
df_test
}

df_test2 <- df_test

df_test2$date <- as.Date(format(df_test$T, "%Y-%m-%d"))

df_test3 <- df_test2

df_test3 <- tibble(x = df_test2$Volume, y = df_test2$date) %>%
  arrange(y)

dataset <- df_test3 %>%
  group_by(y) %>%
  dplyr::filter(x == max(x)) %>%
  distinct(x,.keep_all = T) %>%
  ungroup()

However, I would like to do this for multiple locations. In my original dataframe, I have a column that lists the name of the site, and two columns for when an object enter or leaves a site. The name is just a general text column, and the other two columns are datetime columns. Ideally, I would want an output that looks like the following:

Date | Max Count | Site
 x        y         z
 x        a         b

I also have a couple million rows of data, so I need something that can run in a reasonable time frame.

mumair
  • 51
  • 6

0 Answers0