2

I am dealing with an original dataset has more than 20000 rows. A condensed version of this looks something like this below

  Row      x      y      z      Group   Survive
  1        0.0680 0.8701 0.0619 1       78.43507
  2        0.9984 0.0016 0.0000 1       89.55533
  3        0.4146 0.5787 0.0068 1       85.35468
  4        0.3910 0.6016 0.0074 2       67.49987
  5        0.3902 0.6023 0.0075 2       81.87669   
  6        0.0621 0.8701 0.0678 2       27.26777
  7        0.6532 0.3442 0.0026 3       53.03938  
  8        0.6508 0.3466 0.0026 3       62.32931 
  9        0.9977 0.0023 0.0000 3       97.00324

My goal is to create a column called Match1 as shown below

  Row      x      y      z      Group   Survive   Match1
  1        0.0680 0.8701 0.0619 1       78.43507  g1r1-g2r3
  2        0.9984 0.0016 0.0000 1       89.55533  g1r2-g2r1
  3        0.4146 0.5787 0.0068 1       85.35468  g1r3-g2r2
  1        0.3910 0.6016 0.0074 2       67.49987  g1r2-g2r1
  2        0.3902 0.6023 0.0075 2       81.87669  g1r3-g2r2 
  3        0.0621 0.8701 0.0678 2       27.26777  g1r1-g2r3
  1        0.6532 0.3442 0.0026 3       53.03938  NA
  2        0.6508 0.3466 0.0026 3       62.32931  NA
  3        0.9977 0.0023 0.0000 3       97.00324  NA

The logic behind the values g1r1-g2r3, g1r2-g2r1, g1r3-g2r2 is as follows

1st step, a distance matrix is generated between rows in Group1 and Group2 based on Mahalanobis or simple distance method , sqrt((x2-x1)^2 + (y2-y1)^2 + (z2-z1)^2)

  0.4235 = sqrt{ (0.3910-0.0680)^2 + (0.6016-0.8701)^2 + (0.0074-0.0619)^2} 
  0.4225 = sqrt{ (0.3902-0.0680)^2 + (0.6023-0.8701)^2 + (0.0075-0.0619)^2}
  0.0083 = sqrt{ (0.0621-0.0680)^2 + (0.8701-0.8701)^2 + (0.0678-0.0619)^2}  

  0.8538 = sqrt{ (0.3910-0.9984)^2 + (0.6016-0.0016)^2 + (0.0074-0.0000)^2} 
  0.8549 = sqrt{ (0.3902-0.9984)^2 + (0.6023-0.0016)^2 + (0.0075-0.0000)^2}
  1.2789 = sqrt{ (0.0621-0.9984)^2 + (0.8701-0.0016)^2 + (0.0678-0.0000)^2}

  0.0329 = sqrt{ (0.3910-0.4146)^2 + (0.6016-0.5787)^2 + (0.0074-0.0068)^2} 

Group1 vs Group2

        g2r1    g2r2    g2r3
g1r1    0.4235  0.4225  0.0083
g1r2    0.8538  0.8549  1.2789
g1r3    0.0329  0.0340  0.4614

2nd step, find the minimum or smallest distance in each row.

        g2r1        g2r2        g2r3
g1r1    0.4235      0.4225      **0.0083**
g1r2    **0.8538**  0.8549      1.2789
g1r3    0.0329*     **0.0340**  0.4614

The column Match1 takes value g1r1-g2r3 because rows , Row1-Group1 and Row3-Group2 result in smallest distance 0.0083. Similarly g1r2-g2r1 because, Row2-Group1 and Row1-Group2 results in smallest value 0.8538. Although 0.0329 is the smallest value in the last row of the distance matrix we skip this value and chose the next smallest value 0.0340 because choosing 0.0329 will result in pairing Row3-Group1 with Row1-Group2 and Row1-Group2 is already paired with Row2-Group1, so we chose the next smallest value 0.0340 which results in g1r1-g2r3.

3rd step, calculate average survival based on matched observations in Step2.

 (78.43507 - 27.26777) + (89.55533 - 67.49987) + (85.35468 -81.87669)/3 = 25.56692

I am not sure how to string together these steps programatically I would appreciate any suggestions or help putting all these pieces together efficiently.

Science11
  • 788
  • 1
  • 8
  • 24
  • 2
    A couple pointers: `mahalanobis` is a built-in function. You should use it once, store the result and then pull from it. You may need to convert the result, as it looks like it is the squared distance. Second, the `matrixStats` package allows you to take `rowMin`s efficiently. This is probably the way to go. Finally, Base R's `rowsum` function efficiently calculates group sums, which you can easily convert to group means. – lmo Jan 15 '19 at 12:23
  • @lmo to be honest, I dont know how to calculate the mahalanobis distance using any r packages. I can do this manually but not sure how to do this in r. – Science11 Jan 15 '19 at 18:27
  • 2
    @Science11 lmo is saying that there is a base R function `mahalanobis()` that is already installed in your system that you can use to calculate Mahalanobis distance. Just type `?mahalanobis` in the console to learn more – divibisan Jan 15 '19 at 23:11

0 Answers0