0

I have a problem regarding the linkage of two dataframes in r. Both of the dataframes have a time variable, however, we know that time is not exactly the same in both files (so when the actual time is e.g. 13:05 one file gives 13:05 while the other gives 13:07). Merging based on time is in this case not possible. As this was our original plan, we had to come up with an alternative.

One dataframe consists of the measurements (twice per second; continuously without or with animal present) and the other dataframe of the ID of the animal and the duration of the presence of the animal. We would like to match these dataframes so that the measurements can be linked with the right animal. We assume that the measurements are, on average, higher when animals are present than when they are not present. So I am looking for some sort of sliding highest mean function that can merge dataframes.

I have tried Mclust, but this package cannot deal with the large amount of data I have per day. In addition, it is fairly impossible to link the formed clusters with the right ID. Kmean was also considered, but gave clusters that were not together over time (so 5 different clusters for 5 successive measurements).

Here is a short reproducible example:

# for making dataset containing ID and time in system (BoxTime)
ID<-c("111", "222", "212")
BoxTime<-c("19", "76", "14")
df<-data.frame(ID, BoxTime)
# for making dataset containing observations and time
dataset<-faithful[,c(1,3)]
faithful$time<-rep(1:nrow(faithful))
output<-Mclust(dataset) ## This does not work for my complete dataset!! 

Mclust is not capable of doing the job as the clustering is not as clear cut as in the example dataset (see ?faithful for more info on the dataset).

Ideas are welcome!!!

  • 1
    please supply a short reproducible example. – J.R. Oct 14 '14 at 13:54
  • 1
    Please read the info about how to produce a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). This will make it much easier for others to help you. – Jaap Oct 14 '14 at 13:55
  • I have added a reproducible example, but Mclust does perfect in the example. I cannot make a reproducible example containing 172800 observations on approximately 200 cows (for each day) which is as difficult to cluster as my data is. – Sabine van Engelen Oct 14 '14 at 14:13
  • You need to add faithful in order for others to run the example. Also, [caTools has a running max](http://svitsrv25.epfl.ch/R-doc/library/caTools/html/runminmax.html) you should try to apply to your sample data. –  Oct 14 '14 at 14:19

0 Answers0