0

Background: The following code works to perform a toy example of bigram analysis:

import nltk
from nltk import bigrams
from nltk.tokenize import word_tokenize

text = "some nice words go here"
tokens = word_tokenize(text)
bi_tokens = bigrams(tokens)

bi_count = {}
for token in bi_tokens:
    if token not in bi_count:
        bi_count[token] = 1
    else:
        bi_count[token] += 1

Output:

 print(bi_count)

 {('go', 'here'): 1,
 ('nice', 'words'): 1,
 ('some', 'nice'): 1,
 ('words', 'go'): 1}

Problem: I would like to use the key name (e.g.('go', 'here')) to get the corresponding value (e.g. 1) .

I have tried searching http://www.nltk.org/api/nltk.html?highlight=freqdist and also How to access specific element of dictionary of tuples but I have not been able to find the answer.

Question: Is there a way to solve my problem by using an nltk method or by any other means?

alvas
  • 115,346
  • 109
  • 446
  • 738

2 Answers2

0
search_key = ('go', 'here')
for key, value in bi_count.items(): 
    if key == search_key:
        print(value) #1
0
>>> from collections import Counter
>>> from nltk import bigrams, word_tokenize
>>> text = "some nice words go here"

# Count no. of ngrams
>>> bigram_counter = Counter(bigrams(word_tokenize(text)))

# Iterate through the ngrams and their counts.
>>> for bg, count in bigram_counter.most_common():
...     print(bg, count)
... 
('some', 'nice') 1
('go', 'here') 1
('words', 'go') 1
('nice', 'words') 1

Answer:

# Access the Counter object. 
>>> bigram_counter[('some', 'nice')]
1
>>> bigram_counter[('words', 'go')]
1

Take a look at

alvas
  • 115,346
  • 109
  • 446
  • 738