0

I want to use K-nearest neighbor approach to retrieve closest tree to input tree from data set. The node in tree have value but branches in each tree do not have label.

for example:

Tree 1: (S (V c) (N (P y)) (V (V o) (N (D t) (N d))))

Tree 2: (S (V (V p) (P (R o)) (N (D t) (N d))))

I want to use k-nearest neighbor for this problem. Are you have any idea to use this approach for this problem?

lejlot
  • 64,777
  • 8
  • 131
  • 164
SahelSoft
  • 615
  • 2
  • 9
  • 22

1 Answers1

2

You have to define a distance measure for trees to apply KNN algorithm. There are many possible metrices for trees to use, one of the most popular choices is tree edit distance ( How do I calculate tree edit distance? )

KNN is not a search algorithm. It is not used for finding the closest object, but rather - classifing object to one of the predefined labels. It simply searches for K nearest neighbours of given point X, and returns a label which have most of the neighbours.

For finding a closest tree, assuming that you already have defined the TED (tree edit distance) would be simply iterate through all tree in trees and select the one that minimizes TED(tree,X).

Very good resource regarding Tree Edit Distance is also located here: http://www.inf.unibz.it/dis/projects/tree-edit-distance/tree-edit-distance.php

Community
  • 1
  • 1
lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Thanks for your answer. If i use tree edit distance, How(or where) i use KNN algorithm? – SahelSoft Aug 14 '13 at 19:06
  • updated answer = KNN is not a search algorithm, but rather - a supervised classification model – lejlot Aug 14 '13 at 19:11
  • I know KNN is for classification but I want to find a way to use it for finding closest tree. – SahelSoft Aug 14 '13 at 19:16
  • But you can not, this would be like "I know that this function is for counting number of letters 'a' but I want it to paint a happy face". `KNN` is not a search tool. Your problem can be (and should be) solved by one simple for loop. Of course from the theoretical point of view, you can create a training set in form of `(X,X)` (input is also a label, or `(X,Xid)` where `Xid` is `X`'s identifier), and ask a `KNN` with `K`=1 for a label, and it will return the closest tree, but **do not do it**, it would be highly artificial, inefficient and weird... yet possible. – lejlot Aug 14 '13 at 19:18
  • ok, thanks. How I can find a closest tree that it is like the input tree from root to leaf nodes? (I have 3 trees: tree 1 and tree 2 have similar leaf but the internal nodes are different. and tree 1 and tree 3 have different leaf but similar internal nodes. I want to use an algorithm to select tree 3 for tree 1.) – SahelSoft Aug 14 '13 at 19:26
  • You need to define `TED` accordingly, so that cost of edition near the root is smaller then the one on the deeper levels. – lejlot Aug 14 '13 at 19:29