4

I have pairwise distances that I need to display/convert into a distance matrix. R should have a function for this but I am not sure which one or how to use the same. My data looks like below

A1  A1  0.90
A1  B1  0.85
A1  C1  0.45
A1  D1  0.96
B1  B1  0.90
B1  C1  0.85
B1  D1  0.56
C1  C1  0.55
C1  D1  0.45
D1  D1  0.90

I want to convert/display it as below

      A1      B1      C1      D1
A1    0.90    0.85    0.45    0.96
B1            0.90    0.85    0.56
C1                    0.55    0.45
D1                            0.90

What should I do? Thanks

user45270
  • 69
  • 2
  • 4

1 Answers1

5

You could use reshape:

df <- read.table(textConnection("
A1  A1  0.90
A1  B1  0.85
A1  C1  0.45
A1  D1  0.96
B1  B1  0.90
B1  C1  0.85
B1  D1  0.56
C1  C1  0.55
C1  D1  0.45
D1  D1  0.90"))

dfr <- reshape(df, direction="wide", idvar="V2", timevar="V1")
dfr
#   V2 V3.A1 V3.B1 V3.C1 V3.D1
# 1 A1  0.90    NA    NA    NA
# 2 B1  0.85  0.90    NA    NA
# 3 C1  0.45  0.85  0.55    NA
# 4 D1  0.96  0.56  0.45   0.9

d <- as.dist(dfr[, -1])
d
#      1    2    3
# 2 0.85          
# 3 0.45 0.85     
# 4 0.96 0.56 0.45

# reset labels
attr(d, "Labels") <- dfr[, 1]
d
#      A1   B1   C1
# B1 0.85          
# C1 0.45 0.85     
# D1 0.96 0.56 0.45

The solution mentioned by @alexis_laz seems to be more elegant:

as.dist(xtabs(df[, 3] ~ df[, 2] + df[, 1]))
#      A1   B1   C1
# B1 0.85          
# C1 0.45 0.85     
# D1 0.96 0.56 0.45
sgibb
  • 25,396
  • 3
  • 68
  • 74
  • But How do I get the labels back instead of 1,2,3? – user45270 Mar 19 '14 at 02:27
  • Please see my edit. BTW I really like the solution of @alexis_laz. – sgibb Mar 19 '14 at 09:46
  • Yes! the xtabs solution works. But I also wanted the A1,B1,C1 and D1 values to themselves. That is A1-A1, B1-B1 and so on. How do I get those in the table? – user45270 Mar 19 '14 at 13:43
  • @user45270: I am not quite sure but I think the `dist` class doesn't save the diagonal values (because they are mostly 1 or 0). BTW why are your diagonal values are different from 0/1 and different between A1-A1 and e.g. C1-C1? – sgibb Mar 19 '14 at 13:48
  • Hey, The values are DNA sequence similarities between species and within the same species. Something curious going on in this genus, or there could have been a technical error somewhere in the sequencing. – user45270 Mar 19 '14 at 13:59