Questions tagged [nltk-book]

48 questions
7
votes
1 answer

How to handle with words which have space between characters?

I am using nltk.word_tokenize in Dari language. The problem is that we have space between one word. For example the word "زنده گی" which means life. And the same; we have many other words. All words which end with the character "ه" we have to give a…
The Afghan
  • 99
  • 1
  • 7
4
votes
1 answer

How can I find a specific bigram using nltk in python?

I am currently working with nltk.book iny Python and would like to find the frequency of a specific bigram. I know there is the bigram() function that gives you the most common bigrams in the text as in this code: >>> list(bigrams(['more', 'is',…
Jennifer
  • 47
  • 4
4
votes
1 answer

nltk "OMW" wordnet with Arabic language

I'm working on python/nltk with (OMW) wordnet specifically for The Arabic language. All the functions work fine with the English language yet I can't seem to be able to perform any of them when I use the 'arb' tag. The only thing that works great is…
user2340286
  • 69
  • 1
  • 6
2
votes
1 answer

How to use edit_distance() from nltk.metrics in this example?

I have a bit of problem with using edit_distance() in the following example. I need to print words from the languages mentioned in the languages list in 5 columns, which is not a problem. I have done that: from nltk.corpus import swadesh from…
White
  • 51
  • 6
2
votes
1 answer

Porter and Lancaster stemming clarification

I am doing stemming using Porter and Lancaster and I find these observations: Input: replied Porter: repli Lancaster: reply Input: twice porter: twice lancaster: twic Input: came porter: came lancaster: cam Input: In porter: …
floss
  • 2,603
  • 2
  • 20
  • 37
2
votes
1 answer

When building Feature based grammar, why do I get "invalid syntax" error?

Why do I get "invalid syntax" in the line with the % start S? nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg') % start S S -> NP[NUM=?n] VP[NUM=?n] # NP expansion productions NP[NUM=?n] -> PropN[NUM=?n] NP[NUM=?n] -> Det[NUM=?n]…
nefeli
  • 31
  • 1
2
votes
0 answers

span_tokenize gives generator object as output

I have written a simple piece of code to see exactly how the span_tokenize function works. Documentation for this can be found here: http://www.nltk.org/api/nltk.tokenize.html Here is my piece of code import nltk from nltk.tokenize.api import…
2
votes
0 answers

Setting up ntlk proxy

I was following first chapter of the nltk book. It asks us to install book corpus by running nltk.dowwnload(). I am getting getattrinfo failed error while doing ntlk.download(). After reading online, I came to know that this has something to do…
Mahesha999
  • 22,693
  • 29
  • 116
  • 189
2
votes
1 answer

Best way to understand the input text before applying ngram

Currently I am reading text from excel file and applying bigram to it. finalList has list used in below sample code has the list of input words read from input excel file. Removed the stopwords from input with help of following library: from…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

Building a Character-Level Ngram Language Model with NLTK

I'm trying to build a language model on the character level with NLTK's KneserNeyInterpolated function. What I have is a frequency list of words in a pandas dataframe, with the only column being it's frequency (the word itself is the index). I've…
JaP
  • 87
  • 6
1
vote
1 answer

Conditional Frequency Distribution

Hi :) I am really new to Python and NLP and now trying to go through the NLTK book from O'Reilly. I'm currently at a dead set with the task concerning plotting and tabulating with Conditional Frequency Distribution. The task is the following: "find…
k_bedryk
  • 11
  • 1
1
vote
0 answers

Change name of any state, county, regions, or their abbreviations to country name in python NLTK or other packages

I have a list of locations that is mixed with states, cities and countries, counties and regions, in abbreviations and some in full. For instance, NY, CA, England, UK, USA, Minnesota, London, Bradford, etc. I want it all to be converted to countries…
1
vote
0 answers

What is the more natural parsing, the one that leads to the preferred reading of the sentence

I have those rules: and those two possible parse trees: I am asked for the next question: What is the more natural parsing, the one that leads to the preferred reading of the sentence? Can anyone explain to me, what is more natural in English and…
Ilya.K.
  • 291
  • 1
  • 13
1
vote
0 answers

How to go from type theory to first-order logic lambda-expressions

As can be seen in the O'Reilly NLTK book, Chapter 10, when I want to model the syntax tree of sentence “Bob loves Alice,” namely into first-order logic lambda-expressions, I get the following: where on the left I have the tree of types and on the…
yannis
  • 819
  • 1
  • 9
  • 26
1
vote
2 answers

'word' not in Vocabulary in a corpus with words shown in a single list only in gensim library

Hello Community Members, At present, I am implementing the Word2Vec algorithm. Firstly, I have extracted the data (sentences), break and split the sentences into tokens (words), remove the punctuation marks and store the tokens in a single list. The…
M S
  • 894
  • 1
  • 13
  • 41
1
2 3 4