0

Right now my dataset is this and I am trying to compute a distance matrix in order to plot clusters. The Strings must be an exact match, I labeled the recipes A,B,C but they can be "Pizza", "Pasta", "Salad" etc and I need to create a cluster chart that displays the connection between the recipes but need the distance matrix first. Right now using this,

       library(proxy)
       mat = as.matrix(dist(data)) 

I obtain a 9x9 matrix, not a 3x3 as desired

How can I obtain a distance matrix just based on the recipes in common that connect the customers in order to plot and vice-versa?

hjpotter92
  • 78,589
  • 36
  • 144
  • 183
Buddy Holly
  • 115
  • 1
  • 2
  • Welcome to SO. You could improve your question. Please read [how to provide minimal reproducible examples in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit & improve it accordingly. A good post usually provides minimal input data, the desired output data & code tries - all copy-paste-run'able. – lukeA Jun 27 '16 at 21:37

1 Answers1

0

Here's how you could create a distance matrix:

data <- read.table(sep=",", text="1,A
2,B
1,C
2,C
2,B
3,A
3,B
3,C
3,D")
data <- reshape2::dcast(
  data, 
  V1~V2, 
  fun.aggregate = length, 
  value.var="V2"
)
(mat <- as.matrix(dist(data, meth = "binary")) )
#     1   2   3
# 1 0.0 0.5 0.4
# 2 0.5 0.0 0.4
# 3 0.4 0.4 0.0 
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • worked perfectly, thanks. Do you have any recommendations, by chance, of how to create nodes and edges using this matrix. – Buddy Holly Jun 27 '16 at 22:19
  • Maybe `library(igraph);m <- data[-1];rownames(m) <- data[, 1];g <- graph_from_incidence_matrix(m, weighted=T);plot(g, vertex.color=V(g)$type, edge.width=E(g)$weight^2);get.data.frame(g)`? – lukeA Jun 27 '16 at 22:32
  • PS: I think it must be `(mat <- as.matrix(dist(data[-1], meth = "binary")) )` - the first column in the reshaped data should be excluded. – lukeA Jun 27 '16 at 22:33