1

I want to calculate the distance between multiple rows in a series using R. There are 967 X 35 data table. The table array is as below.

1. 6.23  3.3   4.36  3.9    ----  4.50   1.50  3.35   (35 column)
2. 5.00  2.3   3.36  4.39   ----  2.52   3.40  2.37   (35 column)
3. 5.23  2.6   5.64  4.23   ----  3.50   4.55  3.48   (35 column)

What if I want to calculate the distance between each cells of each rows? For example, let's say I want to calculate the distance between row 1 and row2 / between row 2 and row 3 and so on.

Then the math formula for calculating the distance will be... Square root of {(6.23-5)^2 +(3.3-2.3)^2+(4.36-3.36)^2+(3.9-4.39)^2+------- + (4.5-2.52)^2+(1.5-3.4)^2+(3.35-2.37)^2} and will do the same calculation to every other two rows (1-2row, 2-3row, 3-4row, ….., 967-1 row (967 times))

But I don't know how to code this methodology in R.

I uploaded the data table. (Called "R_skills")

Then using R, I coded as below.

  1. 'Read.xlsx(R_skills)' 2.sample.matrix<-matrix(c(1:33635,ncol=35)
  2. paralleldist(x=sample.mtarix,method="dtw")

Error: unexpected symbol in: "sample.matrix<-matrix(c(1:33635,ncol=35) paralleldist"

SCouto
  • 7,808
  • 5
  • 32
  • 49
Lynn Jung
  • 11
  • 3
  • 1
    Your question is not really formed well for StackOverflow, please edit it per recommendations at https://stackoverflow.com/editing-help. On initial glance, though, you may be missing a right-paren at the end of your `sample.matrix<-matrix...` call: you have two lefts but only one right. If this is not it or insufficient, please consider making this question a little easier and more-reproducible: please post your sample data in an unambiguous format (e.g., `dput(head(x))`, and please put your code in code-blocks. https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve. – r2evans Jun 03 '19 at 16:13

2 Answers2

1

This should do it for you:

sapply(1:(nrow(dt)-1),function(t,dt){dist(dt[t:(t+1),])},dt)
tushaR
  • 3,083
  • 1
  • 20
  • 33
0

@tushaR got it right for base R. :)

mat <- sample.matrix <- matrix(1:33635,ncol=35)

dist_fun <- function(x, y) sqrt(sum(x-y)**2)

s_fun <- function(t){
        sapply(t, function(x) dist_fun(mat[x,], mat[x+1]) )
}
m_fun <- function(t){
        mapply(function(x) dist_fun(mat[x,], mat[x+1]), t)
}
a_fun <- function(t){
        apply(matrix(t, nrow = 1), 2, function(x) dist_fun(mat[x,], mat[x+1]) )
}
l_fun <- function(t){
        unlist( lapply(as.list(t, nrow = 1), function(x) dist_fun(mat[x,], mat[x+1]) ) )
}

# t <- 1:(nrow(mat)-1)
# s_fun( 1:(nrow(mat)-1) )

library(microbenchmark)
n <- 1e5 # 961
mat <- sample.matrix <- matrix(1:(35*n),ncol=35)

microbenchmark("sapply" = s_fun(1:(nrow(mat)-1)),
               "mapply" = m_fun(1:(nrow(mat)-1)),
               "apply" = a_fun(1:(nrow(mat)-1)),
               "lapply" = l_fun(1:(nrow(mat)-1)),
               list = NULL, times = 100L, unit = "ms", check = NULL,
               control = list(), setup = NULL)
#> Unit: milliseconds
#>    expr      min       lq     mean   median       uq      max neval
#>  sapply 315.5892 413.6667 534.0462 498.0186 600.1315 1096.092   100
#>  mapply 313.7013 441.9728 534.9250 505.3026 577.5770 1167.973   100
#>   apply 387.1655 503.9833 615.6288 563.4751 665.4584 1571.387   100
#>  lapply 309.3762 416.0796 553.0482 491.5356 645.2026 1823.269   100

Created on 2019-06-04 by the reprex package (v0.2.1)

cbo
  • 1,664
  • 1
  • 12
  • 27