0

I'm building a K-nearest neighbors classifier, and I'd like to get my distance calculations done all at once (it would help too, as the unvectorized version is taking a loong time to run).

I have a test dataset of size 28000 examples x 784 features, and I have a training dataset of size 42000 examples x 784 features. The code that answers my question should result in a matrix of size 28000 x 42000, where every row contains the distance from that test example to each of the 42000 training examples.

The best I've come up with is using sum and bsxfun to compute all the distances at once for each test example, but I still need to loop through all 28000 examples, and as I said earlier it's taking awhile.

Shai
  • 111,146
  • 38
  • 238
  • 371
user1956609
  • 2,132
  • 5
  • 27
  • 43

1 Answers1

2

pdist2(A, B) will do precisely what you need, where A and B are your training and your test dataset, respectively. Here is the reference: http://www.mathworks.com/help/stats/pdist2.html

s.bandara
  • 5,636
  • 1
  • 21
  • 36
  • How would this work with two separate matrices? The examples show pdist(X) – user1956609 Jan 08 '13 at 00:56
  • What you can use is `pdist2` right out of the box, not `pdist`. I had misread your question, but then corrected my answer. Sorry I sent you to the wrong documentation. – s.bandara Jan 08 '13 at 00:58