3

is there a way to show which words are filler words in a given text using NLTK? if not does anyone know where I can get a wordlist with english fill words? thank you

SOLVED: from nltk.corpus import stopwords

Mirko
  • 67
  • 5

1 Answers1

3

NLTK doesn't provide such a list by itself, though many are available elsewhere on the Web.

There exist quite a number of sources: Web searchs for wordlists with "profanity" "badwords.txt" or blacklists.txt will yield many sources.

In our company's case, we ended up creating our own list and adding to it as needed. Depending on your audience, the list has to be tweaked and adjusted.

Finally, Even though this SO question is closed (and about php) I have found the references and the discussion very useful.

UPDATE: What you want is a list of STOP WORDS.

  1. Try: http://www.ranks.nl/resources/stopwords.html
  2. MIT also maintains a list of stop words.

Hope that helps.

Community
  • 1
  • 1
Ram Narasimhan
  • 22,341
  • 5
  • 49
  • 55
  • apparently the word expletive also stands for badwords, what I meant was not badwords but **filler words**, is there a way where to obtain such a list? or does NLTK provide it? – Mirko Dec 10 '12 at 07:39
  • Updated my answer based on your clarification – Ram Narasimhan Dec 10 '12 at 17:35