6

I was wondering if there is a specific parameter to output a table with all the row names when using dist() and as.matrix(). Here's what I mean:

first=c('john', 'judy', 'jenny')
second=c(3,6,9)
third = c(2,4,6)
df = data.frame(first,second,third)

I have this data frame called df:

 first second third
1  john      3     2
2  judy      6     4
3 jenny      9     6

Here's my desired output:

      john       judy   jenny
john 0.000000 4.41588 8.831761
judy 4.415880 0.00000 4.415880
jenny 8.831761 4.41588 0.000000

This is my code:

df.dist=dist(df)
df.dist=as.matrix(df.dist, labels=TRUE)
df.dist

And Here's what R is giving me:

      1       2        3
1 0.000000 4.41588 8.831761
2 4.415880 0.00000 4.415880
3 8.831761 4.41588 0.000000

I was wondering if there is a specific function or parameter that renames the columns when comparing different entries, or do we just need to code that ourselves?

Another thing that I saw when I typed ?as.matrix is that there is a param called dimnames that lets you input list of names for cols and rows. But I don't know if this would be such a good idea since my dataset has about 100+ entries.

Any help is deeply appreciated. Been stuck for a while.

3442
  • 8,248
  • 2
  • 19
  • 41
jason adams
  • 545
  • 2
  • 15
  • 30
  • Can you include a `dput()` of your `df` data.frame as described in [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It's unclear if you have a column with names or if the names are row-names of the data.frame. If you have mixed character and numeric values `dist()` should be throwing an error. – MrFlick Nov 24 '14 at 21:23

2 Answers2

9

It is only one line of code to add these names as row and column names:

df<-read.table(header=T,text='first second third
1 john      3     2
2 judy      6     4
3 jenny      9     6')

df.dist=dist(df)
df.dist=as.matrix(df.dist, labels=TRUE)
colnames(df.dist) <- rownames(df.dist) <- df[['first']] #this is the only line

> df.dist
          john    judy    jenny
john  0.000000 4.41588 8.831761
judy  4.415880 0.00000 4.415880
jenny 8.831761 4.41588 0.000000

dimnames adds the names as attributes so you might be better off with the above.

ABCD
  • 7,914
  • 9
  • 54
  • 90
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
2

You can also set the first column as the data frame rownames, then use dist:

rownames(df) <- df$first
as.matrix(dist(df[-1]))

#          john     judy    jenny
#john  0.000000 3.605551 7.211103
#judy  3.605551 0.000000 3.605551
#jenny 7.211103 3.605551 0.000000
Psidom
  • 209,562
  • 33
  • 339
  • 356