1

In the nltk there are BigramAssocMeasures TrigramAssocMeasures, QuadgramAssocMeasures,

But if I have 5gram or 6gram, are there 5gramAssocMeasures o 6gramAssocMeasures in nltk?

Can someone help?

alvas
  • 115,346
  • 109
  • 446
  • 738
fff
  • 111
  • 1
  • 9
  • Take a look at [rolling/sliding window iterators](http://stackoverflow.com/q/6822725/198633) – inspectorG4dget Oct 09 '15 at 18:13
  • it is somthing different – fff Oct 09 '15 at 18:37
  • The link is not specific to nltk, but works on any generic list. You could use that to generate n-grams, once you have a list of words, which nltk does give you – inspectorG4dget Oct 09 '15 at 18:40
  • 1
    You can create them yourself by inheriting from `nltk.NgramAssocMeasures`. – lenz Oct 09 '15 at 18:43
  • @inspectorG4dget The question is not about generating n-grams (which can be easily achieved with `nltk.ngrams()`), but about a convenient "collection of {bi,tri,quad,...}gram association measures". – lenz Oct 09 '15 at 18:51
  • Possible duplicate of [QuingramAssocMeasures in NLTK](http://stackoverflow.com/questions/33054764/quingramassocmeasures-in-nltk) – alvas Oct 10 '15 at 21:05

1 Answers1

0

You have to create them yourself.

Have a look at the source code of the association module. You can find it under <nltk>/metrics/association.py (<nltk> stands for your NLTK path).

Start with

class QuingramAssocMeasures(NgramAssocMeasures):
    """
    A collection of 5-gram association measures.
    ...
    """

or whatever you like to call 5-grams. Then you need to define the methods specific to the n-gram order, i.e. the ones that raise a NotImplementedError in the abstract class: ._contingency() and ._marginals(). You can peek at the implementations for 3- and 4-grams and build the methods by analogy. It's going to be a huge bulk of local variables though...

lenz
  • 5,658
  • 5
  • 24
  • 44