I have a problem regarding the linkage of two dataframes in r. Both of the dataframes have a time variable, however, we know that time is not exactly the same in both files (so when the actual time is e.g. 13:05 one file gives 13:05 while the other gives 13:07). Merging based on time is in this case not possible. As this was our original plan, we had to come up with an alternative.
One dataframe consists of the measurements (twice per second; continuously without or with animal present) and the other dataframe of the ID of the animal and the duration of the presence of the animal. We would like to match these dataframes so that the measurements can be linked with the right animal. We assume that the measurements are, on average, higher when animals are present than when they are not present. So I am looking for some sort of sliding highest mean function that can merge dataframes.
I have tried Mclust, but this package cannot deal with the large amount of data I have per day. In addition, it is fairly impossible to link the formed clusters with the right ID. Kmean was also considered, but gave clusters that were not together over time (so 5 different clusters for 5 successive measurements).
Here is a short reproducible example:
# for making dataset containing ID and time in system (BoxTime)
ID<-c("111", "222", "212")
BoxTime<-c("19", "76", "14")
df<-data.frame(ID, BoxTime)
# for making dataset containing observations and time
dataset<-faithful[,c(1,3)]
faithful$time<-rep(1:nrow(faithful))
output<-Mclust(dataset) ## This does not work for my complete dataset!!
Mclust is not capable of doing the job as the clustering is not as clear cut as in the example dataset (see ?faithful for more info on the dataset).
Ideas are welcome!!!