1

Hi :) I am really new to Python and NLP and now trying to go through the NLTK book from O'Reilly. I'm currently at a dead set with the task concerning plotting and tabulating with Conditional Frequency Distribution. The task is the following: "find out which days of the week are most newsworthy, and which are most romantic. Define a variable called days containing a list of days of the week, i.e. ['Monday', ...]. Now tabulate the counts for these words using cfd.tabulate(samples=days). Now try the same thing using plot in place of tabulate. You may control the output order of days with the help of an extra parameter: samples=['Monday', ...]."

This is my code:

import nltk
from nltk.corpus import brown
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
genre_day = [(genre, day)
             for genre in ['news', 'romance']
             for day in days]
cfd = nltk.ConditionalFreqDist(genre_day)
tabulated = cfd.tabulate(conditions=['news', 'romance'],
                         sample=days, cumulative=True)

What I have as an outcome is this:

what I got

Could please someone explain to me why I have these data instead of counting how much each word is used per genre in the corpus? I will be very greatful for any help

k_bedryk
  • 11
  • 1

1 Answers1

1

The list comprehension that you are providing to the cdf function:

(genre, day)
for genre in ['news', 'romance']
for day in days

It is producing a list of pairs with each genre and each day, which will be something like [('news','Sunday'),('news','Monday') ... ('romance','Saturday')] Thus, each genre will have one count for each day, since you are passing True to the cumulative parameter, it just sums up.

To count the occurrence of weekday in the text, you should instead use

(genre, day)
for genre in ["news","romance"]
for word in brown.words(categories=category)
for day in days
if word == day

For each category, it iterates through the words and the pair (genre, word) will be added to the list, if the word is one of the days.

Let's say the text is "Sunday Apple Sunday" in the genre "news". The list comprehension will produce [("news","Sunday"), ("news","Sunday")], and get a count of 2 for "Sunday".

ShengXue
  • 55
  • 1
  • 7