0

I want to obtain the minimum distance between 2 columns, however the same name may appear in both Column A and Column B. See example below;

Patient1    Patient2    Distance
A           B           8
A           C           11
A           D           19
A           E           23
B           F           6
C           G           25

So the output I need is:

Patient Patient_closest_distance Distance
A       B                        8
B       F                        6
c       A                        11

I have tried using the list function

library(data.table)
DT <- data.table(Full_data)
j1 <- DT[ , list(Distance = min(Distance)), by = Patient1]
j2 <- DT[ , list(Distance = min(Distance)), by = Patient2]

However, I just get the minimum distance for each column, i.e. C will have 2 results as it is in both columns rather than showing the closest patient considering both columns. Also, I only get a list of distances, so I can't see which patient is linked to which;

Patient1 SNP

1: A 8

I have tried using the list function in R Studio

library(data.table)
DT <- data.table(Full_data)
j1 <- DT[ , list(Distance = min(Distance)), by = Patient1]
j2 <- DT[ , list(Distance = min(Distance)), by = Patient2]
A. S. K.
  • 2,504
  • 13
  • 22
  • I'm struggling to grasp what you're looking for. Are you able to supply some input data and what you expect the output to look like e.g. `input <- data.table(A=.., B=..)`, `expected <- data.table(..)`. It can be a very rudimentary example with a few rows to show your point. As it is, I could interpret what you want incorrectly.. `dput` is a good function for turning data in to a paste-able format – Jonny Phelps Aug 15 '19 at 15:52
  • Hi! Here's a short tutorial of [how to make good reproducible examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so the community can better help you. I think you'll find it useful-- good luck! :) – Felix T. Aug 15 '19 at 16:02
  • You probably need to build a graph and find all shortest paths. Take a look at [this question and its answers](https://stackoverflow.com/questions/19996444/find-all-shortest-paths-using-igraph-r), which use the `igraph` package. That package most probably has an implementation of [Dijkstra's algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm). – Alexis Aug 15 '19 at 17:19

1 Answers1

1

This code below works.

# Create sample data frame
df <- data.frame(
  Patient1 = c('A','B', 'A', 'A', 'C', 'B'),
  Patient2 = c('B', 'A','C', 'D', 'D', 'F'),
  Distance = c(10, 1, 20, 3, 60, 20)
)
# Format as character variable (instead of factor)
df$Patient1 <- as.character(df$Patient1); df$Patient2 <- as.character(df$Patient2);

# If you want mirror paths included, you'll need to add them.
# Ex.) A to C at a distance of 20 is equivalent to C to A at a distance of 20
# If you don't need these mirror paths, you can ignore these two lines.
df_mirror <- data.frame(Patient1 = df$Patient2, Patient2 = df$Patient1, Distance = df$Distance)
df <- rbind(df, df_mirror); rm(df_mirror)

# group pairs by min distance
library(dplyr)
df <- summarise(group_by(df, Patient1, Patient2), min(Distance))

# Resort, min to top.  
nearest <- df[order(df$`min(Distance)`), ]
# Keep only the first of each group
nearest <- nearest[!duplicated(nearest$Patient1),]
Monk
  • 407
  • 3
  • 8
  • Thanks for your help Monk. The code above almost addresses my question, but I am still getting multiple rows with the same patient rather than showing only the row with the min distance for that patient in either column A or column. Please see example the results below PatientA PatientB `min(Distance)` 1 A B 48 2 A C 33 3 A D 34 4 A E 58 5 A F 27 6 A G 38 7 A H 38 – Catherine Aug 19 '19 at 07:52
  • Ah, gotcha. I've updated my code above. It should be doing what you're expecting now. :) – Monk Aug 19 '19 at 12:54