2

I'm using the NLTK to find word in a text. I need to save result of concordance function into a list. The question is already asked here but i cannot see the changes. I try to find the type of returnde value of the function by :

type(text.concordance('myword'))

the result was :

<class 'NoneType'>
alvas
  • 115,346
  • 109
  • 446
  • 738
Hayat Bellafkih
  • 587
  • 2
  • 11
  • 28
  • Possible duplicate of [Python: how to capture output to a text file? (only 25 of 530 lines captured now)](https://stackoverflow.com/questions/11044072/python-how-to-capture-output-to-a-text-file-only-25-of-530-lines-captured-now) – Jan Trienes Dec 05 '17 at 09:51
  • i aready saw this post, but i prefer to avoid passing by file. – Hayat Bellafkih Dec 05 '17 at 10:23
  • Concordances can only be captured through the stdout, there's no way to save the concordance yet but there's a PR to do so: https://github.com/nltk/nltk/pull/1333 – alvas Dec 05 '17 at 12:13

3 Answers3

3

By inspecting the source of ConcordanceIndex, we can see that results are printed to stdout. If redirecting stdout to a file is not an option, you have to reimplement the ConcordanceIndex.print_concordance such that it returns the results rather than printing it to stdout.

Code:

def concordance(ci, word, width=75, lines=25):
    """
    Rewrite of nltk.text.ConcordanceIndex.print_concordance that returns results
    instead of printing them. 

    See:
    http://www.nltk.org/api/nltk.html#nltk.text.ConcordanceIndex.print_concordance
    """
    half_width = (width - len(word) - 2) // 2
    context = width // 4 # approx number of words of context

    results = []
    offsets = ci.offsets(word)
    if offsets:
        lines = min(lines, len(offsets))
        for i in offsets:
            if lines <= 0:
                break
            left = (' ' * half_width +
                    ' '.join(ci._tokens[i-context:i]))
            right = ' '.join(ci._tokens[i+1:i+context])
            left = left[-half_width:]
            right = right[:half_width]
            results.append('%s %s %s' % (left, ci._tokens[i], right))
            lines -= 1

    return results

Usage:

from nltk.book import text1
from  nltk.text import ConcordanceIndex

ci = ConcordanceIndex(text1.tokens)
results = concordance(ci, 'circumstances')

print(type(results))
<class 'list'>
Hayat Bellafkih
  • 587
  • 2
  • 11
  • 28
Jan Trienes
  • 2,501
  • 1
  • 16
  • 28
  • Is this feature something that people like to see in NLTK? If so, give some love to https://github.com/nltk/nltk/pull/1333 and we'll see how far/fast we can push this into proper nltk function =) – alvas Dec 05 '17 at 12:14
  • There is now a function that returns a list. See my answer – robertspierre Jul 22 '19 at 11:32
3

The Text class now has a concordance_list function. For example:

from nltk.corpus import gutenberg
from nltk.text import Text

corpus = gutenberg.words('melville-moby_dick.txt')
text = Text(corpus)
con_list = text.concordance_list("monstrous")
robertspierre
  • 3,218
  • 2
  • 31
  • 46
0

To use text concordance, you need to instantiate a NLTK Text() object and then use concordance() method on that object :

import nltk.corpus  
from nltk.text import Text  
moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))

Here we instantiate a Text object on the text file melville-moby_dick.txt and then we are able to use the method :

moby.concordance("monster")

If you have a NonType here, it seeems to be because you did not created any Text object and so your variable text is None.

Elliot
  • 308
  • 1
  • 8
  • 1
    I did not write complete code, but text was an NLTK Text object. I post this line of code to check the return type of the concordance method – Hayat Bellafkih Dec 05 '17 at 12:30