0

This question is related to this post: How to apply dtw algorithm on multiple time series in R?

The original post has a dataframe that consists of only 1 variable in interest: speed.kph.ED.

#data: 8 observations, 3 cars 
file.ID2 <- c("Cars_03", "Cars_03", "Cars_03", 
              "Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_04", 
              "Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", 
              "Cars_04", "Cars_05", "Cars_05", "Cars_05", "Cars_05", "Cars_05", 
              "Cars_05", "Cars_05", "Cars_05")
speed.kph.ED <- c(129.3802848, 
                  129.4022304, 129.424176, 129.4461216, 129.4680672, 129.47904, 
                  129.5009856, 129.5229312, 127.8770112, 127.8221472, 127.7672832, 
                  127.7124192, 127.6575552, 127.6026912, 127.5478272, 127.4929632, 
                  134.1095616, 134.1205344, 134.1315072, 134.1534528, 134.1644256, 
                  134.1753984, 134.1863712, 134.197344)

df <- data.frame(file.ID2, speed.kph.ED)
df

Per suggested by the accepted answer, here is the procedure to calculate the distance between 3 cars (3 time series) using dtw:

library(dtw)
library(purrr)
library(dplyr)

# Split your data frame into a list by file.ID2
ds <- split(df, df$file.ID2)
ds

# Use expand.grid to make all combinations of your names, file.ID2 and your values
Names <- expand.grid(unique(df$file.ID2), unique(df$file.ID2))
Values <- expand.grid(ds, ds)

# purrr:map_dbl iterates through all row-combinations of Values and returns a vector of doubles
Dist <- map_dbl(1:nrow(Values), ~dtw(x = Values[.x,]$Var1[[1]]$speed.kph.ED, y = Values[.x,]$Var2[[1]]$speed.kph.ED)$distance)

# Bind answer to Names
library(dplyr)
ans <- Names %>% 
  mutate(distance = Dist)

ans

I am wondering what if I have another two variables that I also want to take into consideration when calculating the distance between 3 cars (3 time series)?

For example, let's say I have another 2 variables score.kph.ED and rating.kph.ED:

score.kph.ED <- c(1:24)
rating.kph.ED <- c(25:48)


df <- data.frame(file.ID2, speed.kph.ED, score.kph.ED, rating.kph.ED)
df

Now, the distance between the 3 cars are calculated not only based on speed.kph.ED, but also based on score.kph.ED and rating.kph.ED.

How can I modify the existing code so that I can achieve this goal?

Thanks so much for your help!

DPatrick
  • 421
  • 3
  • 19
  • 1
    How will you do this for one example without `map_dbl` ? – Ronak Shah Jan 08 '21 at 07:19
  • @RonakShah Can you elaborate further? – DPatrick Jan 09 '21 at 08:15
  • Do you have an example of calculating distance between 3 cars (3 time series) ? – Ronak Shah Jan 09 '21 at 13:23
  • @RonakShah The original example is already calculating the distance between 3 cars (but only using `speed.kph.ED` to calculate that distance). I want to do the same thing, but to use `speed.kph.ED`, `score.kph.ED`, and `rating.kph.ED` to calculate the distance among 3 cars. – DPatrick Jan 09 '21 at 21:24

2 Answers2

0

You could do :

library(purrr)

df <- data.frame(file.ID2, speed.kph.ED, score.kph.ED, rating.kph.ED)
ds <- split(df, df$file.ID2)
Names <- expand.grid(unique(df$file.ID2), unique(df$file.ID2))
Values <- expand.grid(ds, ds)

cols <- names(df)[-1]
result <- map_dfc(cols, function(col) map_dbl(1:nrow(Values),
  ~dtw(x = Values[.x,]$Var1[[1]][[col]], 
       y = Values[.x,]$Var2[[1]][[col]])$distance))

names(result) <- paste0('dist.', cols)
cbind(Names, result)


#     Var1    Var2 dist.speed.kph.ED dist.score.kph.ED dist.rating.kph.ED
#1 Cars_03 Cars_03           0.00000                 0                  0
#2 Cars_04 Cars_03          25.66538                71                 71
#3 Cars_05 Cars_03          69.72117               191                191
#4 Cars_03 Cars_04          25.66538                71                 71
#5 Cars_04 Cars_04           0.00000                 0                  0
#6 Cars_05 Cars_04          96.00103                71                 71
#7 Cars_03 Cars_05          69.72117               191                191
#8 Cars_04 Cars_05          96.00103                71                 71
#9 Cars_05 Cars_05           0.00000                 0                  0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

What you're trying to do is called multivariate DTW, and you can simplify things by using the proxy package. Check this other answer, but you can essentially do what you want like this (using the variables from your example):

proxy::dist(lapply(ds, function(x) { x[, -1L] }), method = "dtw")
Alexis
  • 4,950
  • 1
  • 18
  • 37