How to calculate all possible combinations of distance and time differences between grouped rows of data in R?

Question

My data consists of distances and times traveled for dollar bills. My data looks like this:

   bid ts latitude longitude
1  123  0 38.40513  41.83777
2  123 23 38.41180  41.68493
3  123 45 42.20771  43.36318
4  123 50 40.22803  43.00208
5  456  0 39.12882  42.73877
6  456 12 38.46078  42.79847
7  456 27 40.53698  42.57617
8  456 19 39.04038  42.17070
9  234  0 39.18274  41.17445
10 234  8 39.58652  43.61317
11 234 15 41.32383  41.49377
12 234 23 40.26008  42.01927

bid = bill id

ts = time stamp (days) calculated from original data point when t = 0

latitude and longitude = location

This data shows the movements for bill id's around the United States.

I want to calculate the difference in squared distance and time between all possible combinations of each like row group of 4. For example, for the group of bid's 123 I want to calculate the differences in distance and time between: row 1 and row 2, row 1 and row 3, row 1 and row 4, row 2 and row 3, row 2 and row 4, row 3 and row 4.

This would give me all possible combinations of calculations between this grouping of bid's.

I was able to do this with dplyr between successive rows like this:

detach("package:plyr", unload=TRUE)
library(magrittr)
library(dplyr)
library(geosphere)

deltadata <- group_by(df, bid) %>%

       mutate(

          dsq = (c(NA,distHaversine(cbind(longitude[-n()], latitude[-n()]),
                    cbind(longitude[  -1], latitude[  -1]))))^2,
          dt = c(NA, diff(ts))

              )%>%

ungroup() %>%
filter( ! is.na(dsq) )
deltadata

# A tibble: 21 x 6
     bid    ts latitude longitude          dsq    dt
   <dbl> <dbl>    <dbl>     <dbl>        <dbl> <dbl>
 1   123    23 38.41180  41.68493    178299634    23
 2   123    45 42.20771  43.36318 198827672092    22
 3   123    50 40.22803  43.00208  49480260636     5
 4   456    12 38.46078  42.79847   5557152213    12
 5   456    27 40.53698  42.57617  53781504422    15
 6   456    19 39.04038  42.17070  28958550947    -8
 7   234     8 39.58652  43.61317  46044153364     8
 8   234    15 41.32383  41.49377  69621429008     7
 9   234    23 40.26008  42.01927  15983792199     8
 10   345     5 40.25700  41.69525  26203255328     5
# ... with 11 more rows

PROBLEM: This only calculates the square distances and times between successive rows, namely: row 1 and row 2, row 2 and row 3, row 3 and row 4

Is there a practical way that I can do this for all possible combinations of rows in each group?

I would like my output to have 6 calculations for each bid like this:

# A tibble: 21 x 6
     bid    ts latitude longitude          dsq    dt
   <dbl> <dbl>    <dbl>     <dbl>        <dbl> <dbl>
 1   123    23 38.41180  41.68493    178299634    23  (for rows 1 and 2)
 2   123    45 42.20771  43.36318 198827672092    22  (for rows 1 and 3)
 3   123    50 40.22803  43.00208  49480260636     5  (for rows 1 and 4)
 4   123    12 38.46078  42.79847   5557152213    12  (for rows 2 and 3)
 5   123    27 40.53698  42.57617  53781504422    15  (for rows 2 and 4)
 6   123    19 39.04038  42.17070  28958550947    -8  (for rows 2 and 5)

I am new to R so any suggestions appreciated!

Ralf Stubner · Answer 1 · 2018-04-04T21:51:13.500

You can use an inner_join like this:

library(dplyr)
library(geosphere)

df <- read.table(text = '   bid ts latitude longitude
1  123  0 38.40513  41.83777
2  123 23 38.41180  41.68493
3  123 45 42.20771  43.36318
4  123 50 40.22803  43.00208
5  456  0 39.12882  42.73877
6  456 12 38.46078  42.79847
7  456 27 40.53698  42.57617
8  456 19 39.04038  42.17070
9  234  0 39.18274  41.17445
10 234  8 39.58652  43.61317
11 234 15 41.32383  41.49377
12 234 23 40.26008  42.01927')


df %>%
  inner_join(df, by = c("bid" = "bid")) %>%
  mutate(
    dsq = distHaversine(cbind(longitude.x, latitude.x),
                        cbind(longitude.y, latitude.y))^2,
    dt = ts.x -ts.y
  ) %>%
  filter(dt > 0)
#>    bid ts.x latitude.x longitude.x ts.y latitude.y longitude.y          dsq dt
#> 1  123   23   38.41180    41.68493    0   38.40513    41.83777    178300279 23
#> 2  123   45   42.20771    43.36318    0   38.40513    41.83777 195932999496 45
#> 3  123   45   42.20771    43.36318   23   38.41180    41.68493 198827439286 22
#> 4  123   50   40.22803    43.00208    0   38.40513    41.83777  51230447939 50
#> 5  123   50   40.22803    43.00208   23   38.41180    41.68493  53740739037 27
#> 6  123   50   40.22803    43.00208   45   42.20771    43.36318  49479978030  5
#> 7  456   12   38.46078    42.79847    0   39.12882    42.73877   5557111219 12
#> 8  456   27   40.53698    42.57617    0   39.12882    42.73877  24765506646 27
#> 9  456   27   40.53698    42.57617   12   38.46078    42.79847  53781664569 15
#> 10 456   27   40.53698    42.57617   19   39.04038    42.17070  28958542352  8
#> 11 456   19   39.04038    42.17070    0   39.12882    42.73877   2506329323 19
#> 12 456   19   39.04038    42.17070   12   38.46078    42.79847   7133122323  7
#> 13 234    8   39.58652    43.61317    0   39.18274    41.17445  46043956815  8
#> 14 234   15   41.32383    41.49377    0   39.18274    41.17445  57544071797 15
#> 15 234   15   41.32383    41.49377    8   39.58652    43.61317  69621225065  7
#> 16 234   23   40.26008    42.01927    0   39.18274    41.17445  19614888600 23
#> 17 234   23   40.26008    42.01927    8   39.58652    43.61317  24136886438 15
#> 18 234   23   40.26008    42.01927   15   41.32383    41.49377  15983645507  8

Ralf Stubner · Accepted Answer · 2018-04-09T19:38:27.833

And since you also use the data.table tag, here a solution using that package:

library(data.table)
library(geosphere)

df <- read.table(text = '   bid ts latitude longitude
1  123  0 38.40513  41.83777
2  123 23 38.41180  41.68493
3  123 45 42.20771  43.36318
4  123 50 40.22803  43.00208
5  456  0 39.12882  42.73877
6  456 12 38.46078  42.79847
7  456 27 40.53698  42.57617
8  456 19 39.04038  42.17070
9  234  0 39.18274  41.17445
10 234  8 39.58652  43.61317
11 234 15 41.32383  41.49377
12 234 23 40.26008  42.01927')
dt <- data.table(df, key = 'bid')
dt <- dt[dt, allow.cartesian = TRUE][ts < i.ts]
dt[, dt := i.ts - ts][, dsq := distHaversine(cbind(longitude, latitude),
                                             cbind(i.longitude, i.latitude))^2]
dt
#>     bid ts latitude longitude i.ts i.latitude i.longitude dt          dsq
#>  1: 123  0 38.40513  41.83777   23   38.41180    41.68493 23    178300279
#>  2: 123  0 38.40513  41.83777   45   42.20771    43.36318 45 195932999496
#>  3: 123 23 38.41180  41.68493   45   42.20771    43.36318 22 198827439286
#>  4: 123  0 38.40513  41.83777   50   40.22803    43.00208 50  51230447939
#>  5: 123 23 38.41180  41.68493   50   40.22803    43.00208 27  53740739037
#>  6: 123 45 42.20771  43.36318   50   40.22803    43.00208  5  49479978030
#>  7: 234  0 39.18274  41.17445    8   39.58652    43.61317  8  46043956815
#>  8: 234  0 39.18274  41.17445   15   41.32383    41.49377 15  57544071797
#>  9: 234  8 39.58652  43.61317   15   41.32383    41.49377  7  69621225065
#> 10: 234  0 39.18274  41.17445   23   40.26008    42.01927 23  19614888600
#> 11: 234  8 39.58652  43.61317   23   40.26008    42.01927 15  24136886438
#> 12: 234 15 41.32383  41.49377   23   40.26008    42.01927  8  15983645507
#> 13: 456  0 39.12882  42.73877   12   38.46078    42.79847 12   5557111219
#> 14: 456  0 39.12882  42.73877   27   40.53698    42.57617 27  24765506646
#> 15: 456 12 38.46078  42.79847   27   40.53698    42.57617 15  53781664569
#> 16: 456 19 39.04038  42.17070   27   40.53698    42.57617  8  28958542352
#> 17: 456  0 39.12882  42.73877   19   39.04038    42.17070 19   2506329323
#> 18: 456 12 38.46078  42.79847   19   39.04038    42.17070  7   7133122323

How to calculate all possible combinations of distance and time differences between grouped rows of data in R?

2 Answers2