0

I have a set of text files in a particular domain. I need to rank the files based on some metric.

Please help me out with a few metrics that can be used to rank my text files (term frequency, size, frequency of use, etc..). I would then like to use text mining techniques to rank the files based on one of these techniques.

Pabluez
  • 2,653
  • 3
  • 19
  • 29
siddharth
  • 153
  • 9
  • 1
    What exactly are you trying to achieve? – Sergio Tulentsev Dec 20 '11 at 10:58
  • Explain better what you're trying to do, language and please paste some code that you've already done with the respective errors and questions. – Pabluez Dec 20 '11 at 20:25
  • I Have a set of files on a particular domain and i need to rank them based on different metrics / basics . I have to think to different metrics based on which it can be ranked . And i am on the look out for different metrics – siddharth Dec 21 '11 at 04:53
  • I aim at finding the best measure to rank files in a particular domain . I want the computer to work like an expert scholar and rank the files from a repository . i havent started coding as i am unable to move forward without solving this issue – siddharth Dec 21 '11 at 05:02

1 Answers1

0

The major issue that i had come across is to rank the documents according to thier relevance or some other metric .

Now i have come to a conclusion that documents ranked based on their content(relevance) provides better results.

I am making use of a vector based approach to rank documents based on the search words given in the query . I am not sure if that is the best approach but it provides results with average accuracy

siddharth
  • 153
  • 9
  • I'm still not certain what you're trying to accomplish from your question, but I get a better sense from your answer here. This might be helpful, it is an answer to a slightly different (maybe) question, but maybe will help? http://stackoverflow.com/a/2278780/321143 – Ellie Kesselman Dec 23 '11 at 03:41