1

I have a dataframe with individuals that had been followed by GPS collars. To check if those individuals are independant in there movements or they follow each other, I want to associate each point (each row) of one individual to each point of the other individuals that are in a 12 hours intervals around that first point and then calculate how often they are less than 100 m apart for example.

My dataframe : Data_real

 'data.frame':  57471 obs. of  7 variables:
$ Elephant         : Factor w/ 17 levels "Bull","Bull (one tusk)",..: 1 1 1 1 1 
$ Date.time        : POSIXct, format: "2015-10-06 14:38:00" "2015-10-06 18:37:00" "2015-10-06 22:37:00" "2015-10-07 02:37:00" ...
$ Date        : POSIXct, format: "2015-10-06" "2015-10-06"
$ Date_month       : chr  "2015-10" "2015-10" "2015-10" "2015-10" ...
$ Date.time_plus6h : POSIXct, format: "2015-10-06 20:38:00" "2015-10-07 
$ Date.time_minus6h: POSIXct, format: "2015-10-06 08:38:00" "2015-10-06 
$ coords.x1        : num  329468 329393 328341 327563 327271 ...
$ coords.x1.1      : num  329468 329393 328341 327563 327271 ...


Elephant             Date.time coords.x1 coords.x1.1 Date_month    Date.time_plus6h   Date.time_minus6h
0     Bull 2015-10-06 14:38:00  329467.6    329467.6    2015-10 2015-10-06 20:38:00 2015-10-06 08:38:00
1     Bull 2015-10-06 18:37:00  329392.5    329392.5    2015-10 2015-10-07 00:37:00 2015-10-06 12:37:00
2     Bull 2015-10-06 22:37:00  328341.3    328341.3    2015-10 2015-10-07 04:37:00 2015-10-06 16:37:00
3     Bull 2015-10-07 02:37:00  327562.9    327562.9    2015-10 2015-10-07 08:37:00 2015-10-06 20:37:00
4     Bull 2015-10-07 06:37:00  327271.0    327271.0    2015-10 2015-10-07 12:37:00 2015-10-07 00:37:00
5     Bull 2015-10-07 14:38:00  322977.5    322977.5    2015-10 2015-10-07 20:38:00 2015-10-07 08:38:00

At first, I was trying to inner_join by Date and then calculate the distance between each point that have been jointed.

Association<-NA
for (id in unique(Data_real$Elephant)) {
id1<-Data_real[Data_real$Elephant == id,] #one individual
id2<-Data_real[Data_real$Elephant != id,] #all the others

all.id<-inner_join(id2,id1,by="Date")
deltaX<-(all.id$coords.x2.y - all.id$coords.x2.x) ^ 2   
deltaY<-(all.id$coords.x1.y - all.id$coords.x1.x) ^ 2
all.id$distance<-sqrt (deltaX + deltaY) #distance in meters 

Association1<-rbind(Association1, all.id) 

Data_real<-Data_real[Data_real$Elephant != id,] 

}

The problem with this is if an individual has a point at 23h55 for example, it might be more in relation with points of the next day than the same day, that's why I want to use an interval of time around each point to remove this biais. I search and I think join function can't do that. On another question of this forum, they suggest to use filter, which I tried on my data. It's not perfect also because the association of points in the beginning and end of the months might be biaised, but it's better than by day...

all.id<-inner_join(id2,id1,by="Date_month")
all.id<-as_tibble(all.id)
all.id2<-filter(all.id,Date.time.y >= Date.time_moins6.x & Date.time.y <= 
Date.time_plus6.x) 

The major problem is it seems that the command doesn't work the way I coded it or it's way too long to finish.

I read on different forums, and I found that functions in data.table package might work for me, but still, I don't understand how and I'm not sure it's for the same kind of manipulation.

So my question is : Do you now a good way to join two dataframe that you want to associate each point of one individual to each point of all other indivuals that are +6/-6 hours around the time of that first point? If possible, not as I tried, because we still have the biased value at the end and beginning of the month.

Thank you in advance for you help! :)

  • Did you see this question: https://stackoverflow.com/questions/25815032/finding-overlaps-between-interval-sets-efficient-overlap-joins – Bulat Apr 26 '18 at 21:06
  • Thanks! I didn't see this question! It works fine with my data!! – AJ Bérubé Apr 30 '18 at 14:29
  • Possible duplicate of [Finding Overlaps between interval sets / Efficient Overlap Joins](https://stackoverflow.com/questions/25815032/finding-overlaps-between-interval-sets-efficient-overlap-joins) – Bulat Apr 30 '18 at 19:48

1 Answers1

1

The blunt object solution to this is to first do a cartesian product or cross join, then filter.

What I might consider is something like the following (note, this is not-guaranteed-to-run-code, you didn't provide a reproducible example)

Essentially, split your total data into 17 sub data frames, one for each elephant. Then, get every combination of two elephants. Next, write a function that does a cartesian product of any two elephants and only keeps the rows where the 'y' elephant is within the 6 hour window of the 'x' elephant. Use map2 to pass in the pairs of elephants and bind those together. Now, we filtered the data down so we don't have the actual position data, so we need to join the rest of the data back over. Then you can do the rest of whatever it is you were going to do.

library(dplyr)
each_elephant = split(Data_real,Data_real$Elephant)
pairs = expand.grid(x = levels(Data_real$Elephant), 
                    y = levels(Data_real$Elephant))
fuzzyJoin = function(e1,e2){
  df1 = each_elephant[[e1]] %>% select("Elephant.x" = Elephant,
                       "Date.time.x" = Date.time,
                       Date.time_plus6h,
                       Date.time_minus6h)
  df2 = each_elephant[[e2]] %>% select("Elephant.y" = Elephant,
                       "Date.time.y" = Date.time)
  totalDF = tidyr::crossing(df1,df2)
  totalDF %<>%
    filter(Date.time.y >= Date.time_minus6h & Date.time.y <= Date.time_plus6h)
  return(totalDF)
}
output = do.call(bind_rows,purrr::map2(pairs$x,pairs$y,fuzzyJoin)) %>%
  left_join(Data_real, by=c("Elephant.x"="Elephant","Date.time.x"="Date.time")) %>%
  left_join(Data_real,by=c("Elephant.y"="Elephant","Date.time.y"="Date.time"))
Mark
  • 4,387
  • 2
  • 28
  • 48