I want to ask how to break a sentence into a few words, what this is using of NLP (Natural Language Processing) in python called NLTK or PARSER ? on python i confused with the method, what method should i take in my case.
-
1You should definitely visit: http://www.nltk.org/book/ch03.html – 404pio Mar 09 '15 at 12:42
2 Answers
If you want to find all words the sentence contains, i.e. tokenization, then use NLTK:
tokens = nltk.word_tokenize(sentence)
Note that simple split by whitespaces sentence.split()
works worse.
In particular, 'This quickly comes into problems when an abbreviation is processed. “etc.” would be interpreted as a sentence terminator, and “U.N.E.S.C.O.” would be interpreted as six individual sentences, when both should be treated as single word tokens. How should hyphens be interpreted? What about speech marks and apostrophes?'
Or take a look at another source: "you chop on whitespace and throw away punctuation characters. This is a starting point, but even for English there are a number of tricky cases. For example, what do you do about the various uses of the apostrophe for possession and contractions?
Mr. O'Neill thinks that the boys' stories about Chile's capital aren't amusing.
A simple strategy is to just split on all non-alphanumeric characters, but while o
neill
looks okay, aren
t
looks intuitively bad."

- 4,701
- 1
- 15
- 26
Without using Natural Language Toolkit(NLTK) you may use simple Python command as follows.
>>> line="a sentence with a few words"
>>> line.split()
['a', 'sentence', 'with', 'a', 'few', 'words']
>>>
given in How to split a string into a list?

- 1
- 1

- 36
- 1
- 6