0

I am looking for a simple script that can find frequencies of words for a given document (probably by using portable stemmer).

Is there any library or simple script that does this process?

osi
  • 13
  • 2

2 Answers2

2

use nltk

import nltk

YOUR_STRING = "Your words"

words = [w for w in YOUR_STRING.split()]
freq_dist = nltk.FreqDist(words)

tokens = freq_dist.keys()

#50 most frequent
most_frequent = tokens[:50]

#50 least frequent
least_frequent = tokens[-50:]
MattoTodd
  • 14,467
  • 16
  • 59
  • 76
0

You should be able to count words. Use a collections.Counter or a dict, depending on what you need. That part is easy, but if it isn't you can find the answer by searching on SO itself.

I think you also want the Porter Stemmer, which has a Python version at http://tartarus.org/~martin/PorterStemmer/python.txt

Roshan Mathews
  • 5,788
  • 2
  • 26
  • 36
  • More recent versions of the same stemmer are in nltk. See http://code.google.com/p/nltk/source/browse/trunk/nltk/nltk/stem/porter.py. – Steven Rumbalski Sep 20 '11 at 04:47