1

I am doing a course in NLTK Python which has a hands-on problem(on Katacoda) on "Text Corpora" and it is not accepting my solution mentioned below. Have been stuck on this problem since long. Need to complete this hands-on to proceed foreword in course.

Problem Defenition

  1. Import the text corpus brown.
  2. Extract the list of tagged words from the corpus brown. Store the result in brown_tagged_words

  3. Generate trigrams of brown_tagged_words and store the result in brown_tagged_trigrams.

4.For every trigram of brown_tagged_trigrams, determine the tags associated with each word. This results in a list of tuples, where each tuple contain pos tags of 3 consecutive words, occurring in text. Store the result in brown_trigram_pos_tags.

5.Determine the frequency distribution of brown_trigram_pos_tags and store the result in brown_trigram_pos_tags_freq. 6.Print the number of occurrences of trigram ('JJ','NN','IN')

For this I have tried below solution:
import nltk
from nltk.corpus import brown
brown_tagged_words = [w for w in brown.tagged_words()]
brown_tagged_trigrams = nltk.trigrams(brown_tagged_words)
brown_trigram_pos_tags = [(w1[1],w2[1],w2[1]) for w1,w2,w3 in brown_tagged_trigrams]
brown_trigram_pos_tags_freq = nltk.FreqDist(brown_trigram_pos_tags)
print(brown_trigram_pos_tags_freq[('JJ', 'NN', 'IN')])
kenorb
  • 155,785
  • 88
  • 678
  • 743

3 Answers3

1

brown_trigram_pos_tags = [(w1[1],w2[1],w3[1]) for w1,w2,w3 in brown_tagged_trigrams]

Here change W2 to w3,this will give value around 8

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Karthik
  • 26
  • 1
0

Try this:-

('IN', 'AT', 'AT')

You will get the result: 43271

you are getting 0 because there is no occurrence of ('JJ', 'NN', 'IN').

Chandan Gupta
  • 684
  • 4
  • 11
0
import nltk
from nltk.corpus import brown
brown_tagged_words = brown.tagged_words()
brown_tagged_trigrams = [(w1,w2,w3) for w1,w2,w3 in nltk.trigrams(brown_tagged_words)]
brown_trigram_pos_tags = [(w1[1],w2[1],w2[1]) for w1,w2,w3 in 
brown_tagged_trigrams]
brown_trigram_pos_tags_freq = nltk.FreqDist(brown_trigram_pos_tags)
print(brown_trigram_pos_tags_freq[('JJ', 'NN', 'IN')])

Try this...

PKM
  • 1
  • Welcome on Stack Overflow, while this may answer the question, it is better to give explanations and not only code for your answer. – gogaz May 24 '20 at 17:21